Advances in Biochemical Engineering/Biotechnology 162 
Series Editor: T. Scheper 


Huimin Zhao 
An-Ping Zeng Editors 


Synthetic 
Biology — 
Metabolic 
Engineering 


g) Springer 


162 
Advances in Biochemical 
Engineering/Biotechnology 


Series editor 


T. Scheper, Hannover, Germany 


Editorial Board 


S. Belkin, Jerusalem, Israel 

T. Bley, Dresden, Germany 

J. Bohlmann, Vancouver, Canada 
M.B. Gu, Seoul, Korea (Republic of) 
W.-S. Hu, Minneapolis, Minnesota, USA 
B. Mattiasson, Lund, Sweden 

J. Nielsen, Gothenburg, Sweden 

H. Seitz, Potsdam, Germany 

R. Ulber, Kaiserslautern, Germany 
A.-P. Zeng, Hamburg, Germany 

J.-J. Zhong, Shanghai, Minhang, China 
W. Zhou, Shanghai, China 


Aims and Scope 


This book series reviews current trends in modern biotechnology and biochemical 
engineering. Its aim is to cover all aspects of these interdisciplinary disciplines, 
where knowledge, methods and expertise are required from chemistry, biochemis- 
try, microbiology, molecular biology, chemical engineering and computer science. 


Volumes are organized topically and provide a comprehensive discussion of devel- 
opments in the field over the past 3-5 years. The series also discusses new 
discoveries and applications. Special volumes are dedicated to selected topics 
which focus on new biotechnological products and new processes for their synthe- 
sis and purification. 


In general, volumes are edited by well-known guest editors. The series editor and 
publisher will, however, always be pleased to receive suggestions and supplemen- 


tary information. Manuscripts are accepted in English. 


In references, Advances in Biochemical Engineering/Biotechnology is abbreviated 
as Adv. Biochem. Engin./Biotechnol. and cited as a journal. 


More information about this series at http://www.springer.com/series/10 


Huimin Zhao ¢ An-Ping Zeng 
Editors 


Synthetic Biology — 
Metabolic Engineering 


With contributions by 


H.S. Alper - T. Baumann - J. Becker - N. Budisa - 
G.-Q. Chen - M. Deaner - M. Exner - X. Feng - 

E. Garcia-Ruiz - D. Gerngross - G. GieBelmann - W. Guo - 
M. HamediRad - S.L. Hoffmann - Y.-S. Jin - H. Kim - 
I.I. Kong - J.-J. Liu - C.-W. Ma - D.-C. Meng - G. Morgado - 
S. Panke - L. Pei - T.M. Roberts - M. Schmidt - J. Sheng - 
T.L. Turner - C. Wittmann - A.-P. Zeng - G.-C. Zhang - 
H. Zhao - L.-B. Zhou 


Q) Springer 


Editors 


Huimin Zhao An-Ping Zeng 

Department of Chemical Technische Universitat Hamburg-Harburg 
and Biomolecular Engineering Institut fur Bioprozess- und Biosystemtechnik 

University of Illinois Hamburg, Germany 

Urbana, Illinois 

USA 

ISSN 0724-6145 ISSN 1616-8542 (electronic) 

Advances in Biochemical Engineering/Biotechnology 

ISBN 978-3-319-55317-7 ISBN 978-3-319-55318-4 (eBook) 


DOI 10.1007/978-3-3 19-553 18-4 
Library of Congress Control Number: 2017952559 


© Springer International Publishing AG 2018 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of 
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, 
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission 
or information storage and retrieval, electronic adaptation, computer software, or by similar or 
dissimilar methodology now known or hereafter developed. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this 
publication does not imply, even in the absence of a specific statement, that such names are exempt 
from the relevant protective laws and regulations and therefore free for general use. 

The publisher, the authors and the editors are safe to assume that the advice and information in this 
book are believed to be true and accurate at the date of publication. Neither the publisher nor the 
authors or the editors give a warranty, express or implied, with respect to the material contained 
herein or for any errors or omissions that may have been made. The publisher remains neutral with 
regard to jurisdictional claims in published maps and institutional affiliations. 


Printed on acid-free paper 
This Springer imprint is published by Springer Nature 


The registered company is Springer International Publishing AG 
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland 


Preface: Exploring the Synergy Between 
Synthetic Biology and Metabolic Engineering 


Synthetic biology involves the use of engineering principles to design biological 
parts and systems with new or improved properties, whereas metabolic engineering 
focuses on the engineering of microbial cell factories for the production of fuels and 
chemicals using recombinant DNA technologies (Zhao 2013). Both fields have 
been growing quickly in recent years. In particular, synthetic biology tools have 
increasingly been used to address scientific and technical challenges in metabolic 
engineering. In fact, a recent report by the National Research Council of the 
National Academies of Sciences, Engineering, and Medicine in the United States 
describes a roadmap for accelerating the development of industrial processes 
for production of chemicals using synthetic biology tools (“Industrialization of 
Biology,” 2015). 

This volume of Advances in Biochemical Engineering/Biotechnology explores 
the synergy between synthetic biology and metabolic engineering. It contains a 
total of ten reviews written by world-leading experts; roughly half of these review 
focus on tool development mainly in the synthetic biology area, and the other half 
focus on the application of synthetic biology and metabolic engineering tools for 
the design, engineering, and evolution of microbial cells for production of a wide 
variety of chemicals, materials, and fuels. In the Tool Development section, Budisa 
and coworkers summarize the development and application of the pyrrolysine- 
based system for orthogonal protein translation, a process which produces proteins 
containing noncanonical amino acids at specific sites. Alper and Deaner report the 
various strategies for the discovery and engineering of promoters and terminators 
with desired characteristics for controlling gene expression. Special attention is 
paid to the rational design of synthetic promoters and terminators. Zeng and 
coworkers describe the recent advances in the development of biomolecular 
switches or in vivo biosensors and their applications for dynamic regulation of 
metabolic pathways. Zhao and coworkers summarize various strategies recently 
developed for the design, engineering, and optimization of biochemical pathways 
for the microbial production of chemicals. Both computational algorithms used to 
design efficient metabolic routes and experimental tools to construct and improve 
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the efficiency of the designed pathways are discussed. Complementing the review 
by Zhao and coworkers, Panke and coworkers describe strategies to design novel 
biochemical pathways for in vitro applications such as the multi-step enzymatic 
synthesis of chemicals. 

In the Practical Application section, Chen and Meng discuss the application of 
synthetic biology tools for the metabolic engineering of bacteria to produce cost- 
effectively polyhydroxyalkanoates, a family of biodegradable and biocompatible 
polyesters. Jin and coworkers provide an overview of recent advances in the 
engineering and evolution of Saccharomyces cerevisiae for the production of 
biofuels and chemicals. In a related review, Whittmann and coworkers highlight 
the application of systems biology and synthetic biology in the engineering of 
Corynebacterium glutamicum for industrial production of chemicals. Feng and 
coworkers report the application of '5C metabolic flux analysis to identify and 
tackle the rate-limiting steps in metabolic pathways to improve the production of 
chemicals and fuels. 

Synthetic biology, especially a sub-field of synthetic biology, xenobiology, 
which aims at changing the chemical compositions of living cells (i.e., by creating 
an artificial genetic code and incorporating non-conical amino acids into biosyn- 
thesis), presents many exciting potential applications and scientific challenges. At 
the same time, it also raises some ethical and societal issues. In the last chapter of 
this volume, Schmidt and coworkers review and discuss the state-of-the-art and 
relevant ethics and philosophical aspects of xenobiology and new-to-nature 
organisms. 

In summary, these ten reviews have highlighted some recently developed syn- 
thetic biology and metabolic engineering tools and their broad applications in 
industrial biotechnology and the future development of biology. We thank the 
authors for their contributions to this volume of Advances in Biochemical Engi- 
neering/Biotechnology and hope that the readers will enjoy their work as much as 
we have. 


Urbana, USA Huimin Zhao 
Hamburg, Germany An-Ping Zeng 
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Orthogonal Protein Translation Using 
Pyrrolysyl-tRNA Synthetases for Single- 
and Multiple-Noncanonical Amino Acid 
Mutagenesis 


Tobias Baumann, Matthias Exner, and Nediljko Budisa 


Abstract To date, the two systems most extensively used for noncanonical amino 
acid (ncAA) incorporation via orthogonal translation are based on_ the 


Methanococcus jannaschii TyrRS/tRNAG i. and the Methanosarcina barkeri/ 


Methanosarcina mazei PyIRS/ARNAQ A Pairs. Here, we summarize the develop- 
ment and usage of the pyrrolysine-based system for orthogonal translation, a 
process that allows for the recombinant production of site-specifically labeled 
proteins and peptides. Via stop codon suppression in Escherichia coli and mam- 
malian cells, genetically encoded biomolecules can be equipped with a great 
diversity of chemical functionalities including click chemistry handles, post- 
translational modifications, and photocaged sidechains. 


Keywords Expanded genetic code, Noncanonical amino acid, Orthogonal 
translation, Pyrrolysyl-tRNA synthetase, Stop codon suppression 
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1 Introduction 


Amber suppression, a widespread genetic phenomenon in bacterial species [1], can 
be used to reprogram coding sequences toward the incorporation of noncanonical 
amino acids (ncAAs) [2]. By this methodology, the gene of the target protein or 
peptide is mutated to a TAG amber stop codon at the desired site of ncAA 
incorporation. In parallel with the target, genes of an orthogonal pair, a combination 
of a suppressor tRNA and a suitable aminoacyl tRNA synthetase (aaRS), are 
expressed. Aminoacylation, namely charging the orthogonal tRNA with the 
ncAA, is commonly achieved by a wildtype or engineered aaRS which initially 
activates the ncAA via ATP. When aminoacylated tRNAs bearing a CUA antico- 
don are present, the translational machinery transfers the ncAAs to the growing 
polypeptide chain, resulting in site-specific incorporation. Being one of the three 
stop codons, amber sites within coding sequences naturally act as translational stop 
signs which trigger translation termination — a multistep process mediated by 
release factors [3]. One approach to improve amber suppression efficiency is thus 
to knock out the essential release factor 1 (RF1), either by complementation via a 
mutated RF2 [4] or by removal of essential amber stop codons from the Escherichia 
coli genome [5, 6]. 


2 Discovery and Phylogenic Distributions of PyIRS 
as a Natural Orthogonal Pair 


With natural stop (nonsense) codon suppression identified in bacteria, eukaryotes, 
and viruses [1], amber suppression by species of Methanosarcina was investigated 
intensively [12]. It was a study in 2002 focusing on the Methanosarcina barkeri 
monomethylamine methyltransferase (MtmB) which led to the discovery of the 
22nd proteinogenic amino acid pyrrolysine, which is now commonly abbreviated as 
Pyl [13]. In an accompanying manuscript of the same journal, the corresponding 
genes encoding the tRNA and aminoacyl-tRNA synthetases, py/T and pylS, respec- 
tively, were described [14]. With few exceptions, natural pyrrolysine-containing 
proteins have so far been primarily identified in methyltransferase enzymes as part 
of the methanogenesis pathway in these species [15]. As part of the biocatalyst 


Orthogonal Protein Translation Using Pyrrolysyl-tRNA Synthetases for Single-. .. 3 


Methanosarcina mazei 
A Methanosarcina acetivorans 
Methanosarcina barkeri 
Methanosarcina thermophila 
Methanohalophilus mahii 
Methanococcoides burtonii 
Methanolobus tindarius 
Methanolobus psychrophilus 
Methanomethylovorans hollandica 
Methanosalsum zhilinae 
Methanohalobium evestigatum 
Acetohalobium arabaticum 
Candidatus Methanomassiliicoccus intestinatus 
Firmicutes 
Desulfospira joergensenii 
Bilophila wadsworthia 
Thermacetogenium phaeum 
Desulfotomaculum gibsoniae 
Sporomusa ovata Bacterial type 
Desulfotomaculum acetoxidans 
Thermincola potens 
Desulfosporosinus orientis 
Desulfosporosinus meridiei 
Desulfosporosinus youngiae 
Desulfitobacterium dehalogenans 
B Desulfitobacterium hafniense GC 


Archaeal type 


Ss eels 
aa 


Fig. 1 Phylogenic distribution and structure of pyrrolysyl-tRNA synthetase. (a) Phylogenic tree of 
PyIRS distribution reveals a separation of archaeal and bacterial forms. Sequences were retrieved 
using protein BLAST [7] with Methanosarcina mazei PyIRS (accession number Q8PWY1) as 
query. Retrieved sequences (excluding duplicate entries from different strains and engineered 
variants, hypothetical proteins, bacterial N-terminal domains and non-pyrrolysyl-tRNA synthe- 
tases) were aligned with clustalW2 web server [8]. The phylogenetic tree was visualized using 
SeaView [9]. (b) Amino acid binding pocket of MmPyIRS with pyrrolysyl-AMP, taken from PDB 
ID 2ZIM [10]. (c): Desulfitobacterium hafniense PyIRS bound to tRNA?! taken from PDB ID 
2ZNI [11] 


structure, the function of this lysine derivate is to enable the use of methylamines as 
energy sources for the host cell. In contrast to the archaeal counterpart, the bacterial 
enzymes discovered so far (see Fig. la for a phylogenetic tree based on protein 
sequence alignments) are encoded by two separate genes, with a structurally 
different aaRS as expression product [11]. To date, their natural function remains 
unclear [15]. 
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3 Basic Features of the Natural PyIRS:tRNA’™ System 


Structurally, L-pyrrolysine presents a large lysine derivative with a methyl- 
pyrroline ring at the e-amino group (see Fig. 2). The pyrrolysine-tRNA synthetase 
(PyIRS) belongs to the aaRS class II (subclass IIc) and bears the corresponding 
conserved fold of the catalytic domain. To bind and accommodate the pyrrolysine 
moiety followed by activation via ATP-hydrolysis, the PylRS biocatalyst structure 
bears an unusually large substrate binding pocket (see Fig. 1b) [10, 11]. Key 
functional parameters of the synthetase were identified in 2004, where in vitro 
experiments showed that PyIRS catalyzes the formation of pyrrolysyl-tRNA@, a in 
an ATP-dependent manner [16]. Structure-function studies revealed several PyIRS 
residues important for substrate recognition and discrimination against other 
metabolites. In Methanosarcina mazei Py|RS, for example, Asn346 functions as a 
so-called gatekeeper residue and significantly restricts the substrate range toward 
defined sets of structure [17]. Accordingly, mutation of this residue can result in an 
altered substrate spectrum that allows the charging of tRNAQ) a With ncAAs which 
are rejected in the case of wild-type enzymes [18]. The target of aminoacylation, 


tRNA > also exhibits several unique features. These shape it toward a more 
compact but still L-shaped and structurally similar molecule in comparison to 
conventional bacterial tRNAs. In conjunction with several recognition elements, 
discrimination against other tRNAs is achieved [15]. 


4 First Engineering Reports: Substrate Range and Design 


Back in 1980, Kwok and Wong proposed that transferring a tRNA/aaRS pair from 
one organism to another could provide a route toward an expanded genetic code 
[19]. This concept was picked up in a study by Furter in 1998, where a yeast tRNA/ 
phenylalanyl-tRNA synthetase pair was shown to work in E. coli. With efficiencies 
exceeding 60%, this strategy allowed the incorporation of p-fluoro-phenylalanine 
(as naturally occurring in yeast cells exposed to the ncAA) at amber stop codon sites 
[20]. It was in the pioneering work of Peter Schultz and coworkers where the door 
toward human-made orthogonal pairs was eventually opened. Using a tyrosyl- 
tRNA/synthetase pair of the archaeal hyperthermophilic organism Methanococcus 
jannaschii combined with mutations selected from an amber suppressor tRNA 
library, high-fidelity orthogonal translation was accomplished [2]. 


Fig. 2 Structure of O 
pyrrolysine A H 
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As in the case of many follow-up studies based on the pyrrolysine system 
reviewed here, iterative rounds of negative and positive selection were used to 
isolate tRNA/synthetase pairs specific for the target ncAA. For negative selection, a 
toxic barnase protein gene harboring one or more amber stop codons is commonly 
used. Aminoacylation of the amber suppressor tRNA by endogenous synthetases or 
expression of an unspecific synthetase variant which charges canonical amino acids 
results in cell death. During positive selection for specific ncAA incorporation at 
amber sites, libraries of tRNA/synthetase combinations are screened for clones 
dependent on the presence of the ncAA in the growth medium. For this scenario, the 
chloramphenicol acetyltransferase gene (CAT) is frequently employed, whose 
functional full-length gene product confers resistance to the antibiotic 
chloramphenicol [2]. 

Besides pyrrolysine, wild-type PyIRS enzymes (including the variant produced 
by the Gram-positive bacterium Desulfitobacterium hafniense depicted in Fig. 1c) 
recognize several alternative substrate molecules which can be activated and loaded 
onto tRNA*! [21]. Combining random or rationally chosen mutations with strin- 
gent positive and negative selection systems, the substrate spectrum of the enzyme 
can be broadened or reshaped significantly toward a variety of ncAAs. These are 
grouped and summarized in the next sections. 


5 Simple Chemical Handles and Hydroxy Amino Acids 


Several studies concerning the functional characterization of PyIRS revealed a 
surprisingly broad substrate tolerance, presumed to result from pyrrolysine recog- 
nition via hydrophobicity and, for example, not via its w-group. Figure 3 summa- 
rizes diverse chemical handles including hydroxy amino acids which can be 
incorporated into proteins via Methanosarcina mazei (Mm), Methanosarcina 
barkeri (Mb), and Desulfitobacterium hafniense (Dh) pyrrolysine tRNA/synthetase 
combinations. In addition to the natural substrate and with high efficiency, N-e-tert- 
butoxycarbonyl-L-lysine (BocK) is transferred to tRNA?” both in vitro and in vivo 
[31]. Within the same study, the a-hydroxyacid N-tert-Boc-6-amino-2-hydroxy-L- 
hexanoic acid (15, Boc-LysOH) was also incorporated in a site-specific manner in 
E. coli. With this non-natural bond in the polypeptide chain, more alkali-liable 
compared to the natural amide bond, o-hydroxyacids allow hydrolysis under mild 
conditions. Using an MbPyIRS evolved for incorporation of the azide-bearing 
cyclic Pyl analogue N-e-(((1R,2R)-2-azidocyclo-pentyloxy)carbonyl)-L-lysine (5, 
ACPK) enabled expressed protein ligation via hydrazinolysis. First, an oxoester 
was cotranslationally incorporated into the protein backbone via a noncanonical 
a-hydroxyacid. Second, addition of hydrazine led to site-selective cleavage in vitro, 
which allowed for the subsequent ligation of a chemically synthesized cysteine- 
bearing peptide. In vitro refolding of the non-natural protein fusion reconstituted a 
folded active protein [32]. 
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Fig. 3| Chemical handles. Top row: In vivo synthesized pyrroline-carboxylysine, simple alkenes, 
alkynes, and azido-amino acids for ligation chemistry. Center row: Amino acids with highly 
reactive double and triple bonds. Bottom row: Amino acids with multiple functional groups for 
complex ligation chemistry and backbone analogs. References: 1: [22]; 2, 3: [21]; 4, 9: [23]; 5: 
[24]; 6: [25]; 7: [26]; 8, 11: [27]; 10: [28]; 12, 13: [29]; 14: [30]; 15: [31] 


6 Post-translational Modifications 


In earlier PyIRS studies, the Lys moiety of the substrate ncAA as well as the 
N-e-carbonyl group remained unchanged as they represent substrate identity ele- 
ments recognized by the wild-type enzyme [18]. 

Despite the high interest in their study, efficient protein- and site-selective post- 
translational modifications are difficult to achieve in E. coli and mammalian cells. 
Solid-phase peptide synthesis, on the other hand, suffers from limitations in the 
maximum polypeptide chain length. Consequently, orthogonal translation using 
appropriate ncAAs and compatible orthogonal pairs presents an excellent method- 
ology because of its high selectivity. With methylation as a key post-translational 
modification (PTM) in eukaryotic organisms, methyllysine residues could be site- 
specifically created in histone proteins [34]. To reach this goal, the ncAA 
N-e-allyloxycarbonyl-N-e-methyl-L-lysine (19 as ncAA scaffold) was incorporated 
using the MbPylRS system. Histone proteins recombinantly produced as E. coli 
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inclusion bodies were refolded in vitro followed by conversion into the 
methyllysine-modified variants via a ruthenium catalyst. 

Ubiquitination, that is an isopeptide linkage of a substrate protein to ubiquitin, 
was achieved via a MbPyIRS system and directed evolution toward several ncAA 
substrates. To achieve discrimination against lysine (which differs from the desired 
PTM-residue only by an inserted sulfur atom), Boc protection groups were 
employed. These were removed in vitro subsequent to protein production and 
purification. Within the same study, another lysine-PTM, 5-hydroxy-L-lysine, was 
also incorporated [35]. 

Protein acetylation as another PTM was achieved via cotranslational incorpora- 
tion of N-e-acetyl-L-lysine (16, AcK). To incorporate this ncAA, an MmPylRS 
variant was created by directed evolution. Careful inspection of the enzyme’s 
catalytic parameters revealed that Ky values for AcK remained high but still 
enabled relatively good production yields of modified CAT [36]. After optimization 
of tRNA processing and MbPyIRS expression, the same posttranslational lysine 
modification could be artificially created in human superoxide dismutase (hSOD) 
using Saccharomyces cerevisiae as expression host [37]. 

Aiming for the study of chromatin modifications, three PTMs, namely 
N-e-propionyl- (17, Kpr), N-e-butyryl- (18, Kbu), and N-e-crotonyl-lysine (Ker) 
were successfully incorporated into histone H3 lysine at position 9. Modified target 
proteins were produced in E. coli, refolded in vitro, and obtained in milligram 
quantities. Not naturally occurring in E. coli, some of these histone modifications 
were found to be partially deacylated by unknown mechanisms. Supplementation of 
a deacylase inhibitor significantly reduced this target protein fraction [26]. Chemical 
structures of amino acid analogs enabling the creation of posttranslational protein 
modifications via orthogonal translation are shown in Fig. 4. 


7 Complex Chemical Handles: Crosslinkers 
and Photocages 


Highly selective crosslinking reactions present an important tool for protein inter- 
action studies. Inevitably exposed to non-specific interactions inside the highly 
crowded environment of the cellular host, the artificially introduced chemical 
handles have to remain as stable and inert as possible. Once purified and/or exposed 
to their interaction partner, however, highly specific and fast reactions are desirable. 
At the same time, temperatures need to remain low so that protein denaturation is 
avoided or at least kept at a minimal level. Despite the availability of several 
chemical methods, those compatible with physiologic conditions, namely ambient 
temperature and close to neutral pH in aqueous solution, are consequently most 
promising. Bearing the potential to introduce new-to-nature chemistries site- 
specifically into proteins, several PyIRS variants with altered ncAA substrate 
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Fig. 4 Post-translational modifications. Top row: N-Acetylated lysines and N-methyl] lysine. N- 
Methy!] lysine has not directly been incorporated but can be introduced by chemical deprotection of 
N-e-protected and methylated lysine derivatives. Bottom row: 5-Branched lysine derivatives used 
for traceless ubiquitination. References: 16: [33]; 17, 18: [26]; 19: See [34] for an example. The 
original publication falsely states the PyIRS mutation as Y384F, which is the corresponding 
position in MmPyIRS. 20-23: [35] 


spectra were created. Structures of the following types of chemical handles suitable 
for crosslinking reactions and photocaging are depicted in Fig. 5. 


7.1 Crosslinkers 


With the selective joining of small functional groups under simple reaction condi- 
tions, the field of click chemistry meets several demands of protein crosslinking. 
Using pyrrolysine tRNA/synthetase pairs, several alkyne-containing ncAAs were 
successfully introduced into proteins via orthogonal translation. Using a MbPyIRS/ 
tRNA! pair, the aliphatic azide (S)-2-amino-6-((2-azidoethoxy)carbonylamino) 
hexanoic acid and the alkyne (S)-2-amino-6-((prop-2-ynyloxy)carbonylamino) 
hexanoic acid were successfully incorporated into model proteins using the 
E. coli translation apparatus. Via copper-catalyzed Huisgen [3 + 2] cycloaddition, 
the latter protein modification allowed subsequent biotin-labeling in vitro [43]. A 
follow-up study managed to functionalize yeast cells with this orthogonal transla- 
tion system [37]. 

Charging tRNA*Y ' with ncAAs bulkier than pyrrolysine requires alteration and 
enlargement of the PylRS substrate binding pocket. Both in E. coli and mammalian 


Orthogonal Protein Translation Using Pyrrolysyl-tRNA Synthetases for Single-. .. 9 
N 
\\ 
N aN 
N< S 
<N 
NH O fo) R= 
24 oANE 25 oANH 26 oni site 
2 
R R O 
MbPyIRS MbPyIRS MmPyIRS HO 
L274M/C313A/Y349F Y306A/Y384F 
o- Oo 
O ] 
ON O 
NO O2N 
2 NO, 
e) O s Ss 
Ad Ad . P 
27 OF ~N™ 28 O~ “NH 30 
R R H.N OH H.N OH 
MmPyIRS MbPyIRS MbPyIRS MbPyIRS 
Y306M/L309A/ M241F/A267S/ N311M/C313Q/V366G/ N311Q/C313A/V366M 
C348T/T364K Y271C/L274M W382N/R85H 


Fig. 5 Crosslinkers and photocaged amino acids. Top row: Photo-crosslinkers. Bottom row: 
Photocaged (methyl) lysine and cysteine. References: 24: [24]; 25: [38]; 26: [39]; 27: [40]; 28: 
[37], [41]; 29, 30: [42] 


cells, an azide-bearing cyclic pyrrolysine analogue N-e-(((1R,2R)-2-azidocyclo- 
pentyloxy)carbonyl)-L-lysine (5, ACPK) could be incorporated via an expanded 
genetic code. With this artificial chemical functionality introduced, the biocompat- 
ible Cu() ligand BTTES (2-[4-{(bis[(1-tert-butyl-1H-1,2,3-triazol-4-yl)methyl] 
amino)-methy]}-1H-1,2,3-triazol-1-ylJethyl hydrogen sulfate) enabled copper- 
induced azide-alkyne cycloaddition (CuAAC) [44]. Rational design of an 
MmPyIRS double-alanine mutant (positions 346 and 348), for example, freed 
space which is filled by side chain moieties in the wild-type enzyme. This enabled 
the efficient incorporation of large para-substituents such as p-propargyloxy phe- 
nylalanine (35) among six additional ncAAs [45]. 

Multiple reactivities were enabled via the electron-deficient olefin N-e-acryloyl- 
L-lysine (6, AcrK) and N-e-crotonyl-L-lysine (7, CrtK) incorporated into proteins 
via an evolved MmPyIRS. Reactions tested with these ncAAs span a 1,4-addition 
for protein PEGylation, radical polymerization toward a copolymer hydrogel, and 
1,3-dipolar cycloaddition. Although wild-type PyIRS only afforded the incorpora- 
tion of CrtK at low efficiency, the evolved synthetase variant enabled target protein 
production levels of 25 mg/L of E. coli culture. AcrK-modified superfolder GFP 
(sfGFP) efficiently reacted with thiol-containing nucleophiles, which resulted in 
turn-on fluorescence. The acrylamide moiety was further employed for labeling the 
outer membrane of E£. coli via incorporation into OmpX [25]. 
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Several ncAA crosslinking approaches suffer from relatively low reaction rate 
constants. To tackle this limitation, the Chin group managed to incorporate three 
ncAAs, namely N-e-L-thiaprolyl-L-lysine (14), as well as N-e-p-cysteinyl-L-lysine 
and N-e-L-cysteinyl-L-lysine (12), for cyanobenzo-thiazole condensation. Once 
introduced, the 1,2-aminothiol moiety meets key demands of bioorthogonal reac- 
tions: Although not occurring naturally in proteins, it allows for their efficient, 
rapid, and specific labeling. Multiple rounds of mutagenesis and selection afforded 
the creation of an MbPyIRS variant that charges tRNA‘ with N-e-L-thiaproly]-L- 
lysine, an ncAA which can be efficiently deprotected via O-methylhydroxyamine to 
form N-e-L-cysteinyl-L-lysine. At high rates, this modification can be reacted with 
2-cyanobenzothiazole (CBT) at physiological temperature and pH 7 [29]. 

To avoid copper-based catalysis of the crosslinking reaction, norbornene amino 
acids have proven useful for orthogonal translation and tetrazine click chemistry 
[23]. PyIRS from Methanosarcina mazei was evolved via iterative saturation 
mutagenesis to incorporate such ncAAs. Subsequently, the modified protein was 
reacted with nitrile imines created from hydrazonoy] chloride or with tetrazines in 
an inverse electron demand Diels—Alder reaction [46]. By employing tetrazines for 
this type of reaction, rapid fluorogenic protein labeling could also be achieved with 
bicyclo[6.1.0]non-4-yn-9-ylmethanol (BCN). Because of its high specificity, the 
reaction between these two moieties proceeded in E. coli with low background. 
Incorporation of a BCN-containing ncAA in mammalian cell culture via an evolved 
MbPyIRS variant allowed TAMRA fluorescence labeling via tetrazine-conjugated 
fluorophores supplied to the growth medium [27]. 

To enrich interaction partners for mass spectrometry-based identification, a 
“click-and-release” strategy was developed. Human small ubiquitin-related modi- 
fier (SUMO) was C-terminally labeled with an alkyne-containing pyrrolysine 
analog further bearing an ester bond. This dual functionality enabled both copper 
(1)-catalyzed azide-alkyne cycloaddition (CuAAC) to bind SUMOylated proteins to 
a resin for enrichment and subsequent release via mild alkaline treatment followed 
by vacuum-assisted removal of the base [30]. 

Instead of supplying the final ncAA for tRNA-charging via PyIRS, cells genet- 
ically further equipped with an appropriate modification pathway can be used to 
synthesize the desired ncAA in vivo from precursor molecules. Pyrroline-carboxy- 
lysine (Pcl, 1) as a multi-purpose crosslinker was intracellularly produced from p- 
ornithine by two Pyl biosynthetic genes (py/C and py/D). At neutral pH, modified 
proteins were shown to react efficiently with 2-amino-benzaldehyde or 2-amino- 
acetophenone. Consequently, the incorporated ncAA allowed target protein 
PEGylation as well as labeling with diverse substrates including peptides, oligo- 
saccharides, oligonucleotides, fluorescence, and biotin. Surprisingly, the 
demethylated analog Pcl even proved superior over pyrrolysine during orthogonal 
translation [22]. 

To overcome limitations from the above-mentioned CuAAC, Plass 
et al. introduced two previously characterized mutations to shape the MmPylRS 
substrate binding pocket toward bulky pyrrolysine derivatives. This enabled the 
genetic incorporation of strained alkynes with relatively large side-chain sizes such 
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as N-€-(cyclooct-2-yn-1-yloxy)carbonyl)-L-lysine (10). With applications for single 
molecule FRET studies and bacterial cell labeling, incorporation into the model 
protein GFP was efficient. Yields exceeded 10 mg/L of FE. coli culture and copper- 
free click reactions with commercially available azide-functionalized dyes were 
shown [28]. 


7.2  Photocages 


To reveal chemical functionalities in a spatiotemporal manner, recombinant pro- 
teins can be functionalized with photocaged ncAAs via orthogonal translation. Key 
prerequisites for this technique include non-toxic light wavelengths and intensities 
for rapid deprotection as well as efficient genetic incorporation of the ncAAs. 
Although enzymatic target protein labeling could in principle prove to be an 
in vitro labeling alternative, it frequently does not reach completeness and appro- 
priately modified products often remain difficult to isolate. 

As described in the previous section, lysine residues present important targets for 
PTM. Nuclear localization of eukaryotic proteins is frequently dependent on the 
presence of these residues. In yeast cells, successful usage of a photocaged lysine 
derivative N-e-[(1-(6-nitrobenzo[d][1,3]dioxol-5-yl)ethoxy)-carbonyl]-L-lysine 
(28) for genetic code expansion extension was reported [37]. As part of another 
study, this ncAA was incorporated into nuclear localization sequences (NLS) of 
nucleoplasmin and the tumor suppressor p53 in human cells. This afforded directed 
evolution of MbPyIRS toward the new substrate. Using ncAA-modified proteins, 
Gautier et al. managed to change their cellular localization via controlled photolysis 
of the caged residue [41]. 

Photocaged N-e-methyl-L-lysine was used to facilitate MmPylRS discrimination 
against lysine. Positive and negative selection led to the isolation of a PyIRS variant 
which used the ncAA for tRNA”! charging. Photolytic deprotection was reported 
to proceed efficiently during exposure to 365 nm UV light for | h under physio- 
logical pH [40]. 


7.3 Photo-Crosslinkers 


Compared to copper-catalyzed or chemically-induced reactions which frequently 
suffer from limited biocompatibility, photo-induced crosslinking reactions present 
a promising alternative and feature spatiotemporal control. Consequently, several 
lysine derivatives have been synthesized and tested with pyrrolysine tRNA/synthe- 
tase pairs for incorporation into proteins. 

Orthogonal translation was reported successful in yeast cells with 
N-e-[(2-(3-methyl-3H-diazirin-3-yl)ethoxy)carbonyl]-L-lysine (25) as a_photo- 
crosslinking ncAA and human superoxide dismutase (hSOD) as the target protein 
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[37]. Using an evolved variant instead of wild-type MbPyIRS, the same ncAA 
(alternatively called 3’-azibutyl-N-carbamoyl-lysine, AbK) was incorporated into 
cyclin-dependent kinase 5 (Cdk5). Photoactivation via UV light (360 nm) led to 
crosslinking to its substrate, p21-activated kinase 1 (Pak1) in mammalian HEK 
293 T cells [38]. In a similar fashion, the acid chaperone HdeA of enteric bacterial 
Shigella pathogens was used to study host cell infection mechanisms and in vivo 
protein-protein interactions via a photoaffinity group. ((3-(3-Methyl-3H-diazirin-3- 
yl)propamino)carbonyl)-N-e-L-lysine (DiZPK, 24) proved acid-stable and resulted 
in covalent protein coupling superior to p-benzoylphenylalanine (Bpa) [24]. 

Nitrile imines created from a tetrazole moiety via UV-irradiation can be used for 
photo-crosslinking cycloaddition reactions with norbornene-modified proteins. 
However, Kaya et al. also showed that harmful effects can arise from the UV 
irradiation required for crosslinking [46]. Compared to the latter ncAA, 
cyclopropene presents a less bulky moiety and should thus be incorporated into 
proteins more efficiently. Its inherently high reactivity stems from ring strain and 
enables rapid photoinduced cycloaddition reactions. These have been demonstrated 
with two tetrazoles in E. coli and mammalian HEK 293 cells. Biocompatibility with 
the cellular environment was further assessed by exposure to glutathione as an 
abundant biological nucleophile [47]. 

Liberated by light, radicals can be used to drive copper-free crosslinking reac- 
tions. Via anti-Markovnikov thiol-ene and thiol-yne coupling (TEC and TYC), 
regioselective reactions were induced with low-energy, near-UV light 
(365-400 nm). Robust product formation was achieved in aqueous buffer using 
VA-044 or 2,2-dimethoxy-2-phenylaceto-phenone (DPAP, 10%) as photo-induced 
catalysts. Able to link thiols to an alkyne, TYC enabled fluorescent protein labeling 
with N,N’-bis(dansyl)cystamine [48]. 

Less harmful for biomolecules than UV light and further benefitting from higher 
penetration depths in biological samples and tissues, red light was successfully 
employed for protein crosslinking. Instead of protein-protein interactions, a new 
method to study the binding of proteins to nucleotides was developed using 
orthogonal translation. Schmidt and Summerer managed to genetically incorporate 
N-e-[2-(furan-2-yl)ethoxy]carbonyllysine (26) via amber suppression. Because the 
wild-type enzyme did not result in detectable amounts of modified target protein, 
the substrate binding pocket of MmPylRS was evolved toward incorporation of the 
ncAA. Complex formation of an HIV-1 protein (trans-activator of transcription, 
TAT) with a hairpin RNA as its natural interaction partner could be detected via the 
new chemical functionality. The required reactive singlet oxygen was induced via 
photosensitizers such as N-e-[2-(furan-2-yl)ethoxy]carbonyllysine [39]. 
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Fig. 6 Aromatic amino acids. Top row: Phenylalanine, methyltyrosine and halophenylalanines. 
Middle rows: Phenylalanine derivatives with bulky para and meta substituents. Bottom row: 
Histidine analogs. References: 31, 33, 34: [49]; 32: [18]; 35-38: [45]; 39-42: [50]; 43-47: [51] 


8 Aromatic Amino Acid Analogs (Phe, His, Tyr) 


With the basic capability to transfer several pyrrolysine derivatives to the partner 
tRNA in vitro and in vivo, PylIRS variants were successfully evolved toward 
structurally different aromatic ncAAs with chemical structures as shown in Fig. 6. 

A broad-specificity double-alanine mutant of PyIRS allowed efficient production 
of recombinant proteins site-specifically modified with p-propargyloxy phenylala- 
nine (35) supplied at 5 mM concentration [45]. Bearing short aromatic side chains, 
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L-phenylalanine, p-iodo- and p-bromo-L-phenylalanine were shown to be used for 
tRNA aminoacylation by an evolved Methanosarcina mazei PylRS enzyme. The 
iodinated amino acid analog is envisioned for X-ray crystallography (as a marker 
heavy atom otherwise incorporated via crystal soaking or chemical treatment) and 
for protein crosslinking via Suzuki—Miyaura reactions. Although orthogonal trans- 
lation using tyrosine analogs is frequently accomplished using the Methanococcus 
jannaschii tyrosyl-tRNA synthetase (MjTyrRS)RNA'™ pair, this system bears 
different structural ncAA substrate requirements compared to the pyrrolysine 
system [49]. Further expansion of the structural substrate diversity of the PyIRS 
substrate spectrum focused on O-methyl-L-tyrosine (Ome, 32). Using X-ray crys- 
tallography and non-hydrolyzable ATP derivatives, the evolved MmPylRS was 
shown to exhibit a decreased active site volume and compensatory mutations for 
lost substrate interactions. The aforementioned “gatekeeper” residue, Asn346, was 
specifically targeted for mutagenesis and directed evolution yielded a high-fidelity 
synthetase enzyme. Including genetic adjustments for tRNA processing, the 
Ome-specific enzyme also proved functional in mammalian HEK292 and HeLa 
cells [18]. Targeting PylIRS residue Asn346 for mutagenesis, a N346S:C348I 
double mutant of the M. mazei aminoacyl-tRNA synthetase was also found to be 
“polyspecific”, enabling the incorporation of several meta-substituted phenylala- 
nine-based aromatic ncAAs [52]. With chemical ncAA synthesis starting from 
tyrosine, red-shifted photoswitchable azobenzenes were incorporated into sfGFP 
[53] via an MmPylIRS variant evolved in a previous study for azobenzenes 
photoswitchable via 365-nm light [54]. 


9 Multiple Noncanonical Amino Acid Mutagenesis 


In principle, incorporating any of the above-mentioned noncanonical amino acids at 
multiple defined sites in a target protein via stop codon suppression is feasible. 
Incorporation of two different ncAAs can also be achieved, for example by com- 
bining PylRS-based amber suppression with a quadruplet-decoding MjTyrRS- 
based orthogonal pair [55]. Introduction of more and more stop codons in the target 
gene, however, decreases the final protein yields obtainable via recombinant 
expression. Toward more efficient single and multiple ncAA incorporation via 
stop codon suppression, tRNAG) a has been rationally evolved, with efficiency 
improvements expected to stem from interactions with F. coli elongation factor- 
Tu (EF-Tu) [56]. As introduced above, FE. coli strains with attenuated [57] or 
deleted RF1 facilitate amber suppression at multiple sites, both in vivo and in 
cell-free systems [58-60]. To minimize toxicity resulting from ncAA incorporation 
at off-target sites in the host cell proteome, genomic recoding has yielded RF1-free 
E. coli strains deprived of the amber stop codons of 95 essential or all protein- 
coding genes, respectively [6, 61]. Whereas the unique structural features of 


tRNAY}, establish the orthogonality of PylIRS-based amber suppression, recent 
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in vitro studies revealed multiple steps in orthogonal translation which limit the 
efficiency [62]. Consequently, further expression strain and plasmid setup engi- 
neering can be expected to yield improvements for ncAA incorporation. 


10 Outlook and Perspectives 


During the past decade, orthogonal protein translation has been well-established. 
Orthogonal tRNA/synthetase pairs have been made compatible with host organisms 
such as E. coli, S. cerevisiae, mammalian cell culture, C. elegans, and 
D. melanogaster. Specialized fields of biological research, for example pathogen 
microbiology and virology (see previous sections), now employ the diversity of 
ncAAs to modify proteins site specifically with high efficiency. Using orthogonal 
pairs, diverse methods have become available for precise fluorescent protein label- 
ing or selective crosslinking in vitro and in vivo. 

Limitations revealed during the development of synthetase enzymes which 
selectively charge their partner tRNA with ncAAs have been recognized and 
addressed. For instance, in vitro assays reveal catalytic efficiencies of evolved 
synthetases and allow fine-tuning toward higher orthogonal pair efficiency 
[63]. Protein structures have been determined not only for the wild-type enzymes 
but also for several variants generated by directed evolution. In the near future these 
data should allow the generation of precisely designed PyIRS active site libraries as 
a new method to obtain an even more diverse ncAA substrate spectrum. As 
illustrated by the incorporation of pyrroline-carboxy-lysine (Pcl, 1), the production 
of the ncAA from less complex and thus more affordable precursor molecules can 
be achieved by the same cellular host which is able to incorporate it genetically into 
proteins. The development of crosslinking agents and photo-induced reactions 
follows a route toward high biocompatibility, at the same time maintaining selec- 
tivity and reaction speed. Consequently, many new applications of orthogonal 
translation are expected in the near future. 

Because of the large and constantly increasing number of orthogonal pairs 
reported so far, it should be noted that this work cannot completely cover all 
developments in the PyIRS research field. For the same reason, developments of 
other orthogonal pairs such as those based on MjTyrRS could not be covered herein. 
Consequently, readers are referred to reviews such as those of Neumann or Liu and 
Schultz for further references [64, 65]. 
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Promoter and Terminator Discovery 
and Engineering 


Matthew Deaner and Hal S. Alper 


Abstract Control of gene expression is crucial to optimize metabolic pathways 
and synthetic gene networks. Promoters and terminators are stretches of DNA 
upstream and downstream (respectively) of genes that control both the rate at 
which the gene is transcribed and the rate at which mRNA is degraded. As a result, 
both of these elements control net protein expression from a synthetic construct. 
Thus, it is highly important to discover and engineer promoters and terminators 
with desired characteristics. This chapter highlights various approaches taken to 
catalogue these important synthetic elements. Specifically, early strategies have 
focused largely on semi-rational techniques such as saturation mutagenesis to 
diversify native promoters and terminators. Next, in an effort to reduce the length 
of the synthetic biology design cycle, efforts in the field have turned towards the 
rational design of synthetic promoters and terminators. In this vein, we cover 
recently developed methods such as hybrid engineering, high throughput charac- 
terization, and thermodynamic modeling which allow finer control in the rational 
design of novel promoters and terminators. Emphasis is placed on the methodo- 
logies used and this chapter showcases the utility of these methods across multiple 
host organisms. 
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1 Introduction 


Promoters and terminators play an indispensable role in metabolic engineering and 
synthetic biology applications for controlling gene expression. These critical ele- 
ments play a part in regulating both the strength of transcription and the longevity 
of the transcript. Together, these two forces dictate the overall abundance of MRNA 
within the cell and ultimately play a significant role in determining protein contents 
within cells. At the same time, optimizing microorganisms for chemical production 
via metabolic engineering often requires the use of these elements to create highly 
regulated intracellular flux [1], often through high-strength promoters [2]. Fine- 
level control, inducibility, and expression range are all quite important in these 
endeavors, as has been seen with large strain engineering efforts such as rewiring 
the yeast Saccharomyces cerevisiae for industrial-level heterologous artemisinin 
production [3]. Fortunately, our understanding and cataloging of synthetic control 
elements such as promoters and terminators is continuously improving. In this 
chapter we consider the selection and engineering of both promoters and termi- 
nators for a variety of possible host organisms. Initially, we describe early strategies 
which mainly relied on genome mining and semi-rational mutagenesis techniques 
to improve sequence diversity and function. Next, we describe recent advances in 
the design of these parts using techniques such as hybrid engineering, high- 
throughput characterization, thermodynamic modeling, synthetic part 
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development, and rational design. In each of these cases, both our understanding 
and the utility of these parts are enhanced, thus increasing the rate of design cycles 
within cells. 


2 Early Efforts of Promoter Identification 
and Diversification 


2.1 Native Promoter Mining 


The initial set of catalogued promoters for synthetic use was derived from the 
genome of the host organism or a phage that targets the host organism [4-8]. These 
promoters were often uncovered as a result of genomic dissections. The advent of 
genome sequencing and annotation (especially of hosts such as Escherichia coli 
and S. cerevisiae) allowed for the rapid discovery of endogenous promoters, 
especially when coupled with mRNA quantification methods. In a similar fashion, 
promoters for more complex systems such as mammalian hosts have largely been 
discovered via high-throughput screening methods such as “promoter trapping [9- 
11].” This approach typically involves random integration of a promoter-less vector 
containing GFP followed by fluorescence-based selection to determine adjacent, 
upstream regions of the genome that enable transcription. In similar fashion to other 
hosts, the sequencing of genomes (such as the CHO genome [12]) allowed for the 
discovery of novel, dynamic promoters such as pTXnip, which expresses propor- 
tionally to cell density [13]. 

Libraries of native promoters serve an important role as major synthetic parts 
and are among the most highly characterized [14, 15]; however, they remain limited 
in their ability to sample complete gene expression ranges. Although multiple gene 
overexpression techniques have been used in E. coli [16-18] and S. cerevisiae [19- 
22], among other organisms, this approach can be limited and leads to the build-up 
of toxic intermediates that reduce productivity [23]. In some cases — including 
commonly-used native promoters in S. cerevisiae — dependencies such as carbon- 
source metabolism [24] can impact part performance. Such a conditional function is 
exacerbated in mammalian hosts, as commonly-used viral promoters vary widely in 
performance between cell lines and are often unstable after many cell generations 
[25-27]. As a result, further engineering of promoters is necessary to obtain desired 
fine-tuned expression, stability, and conditional performance. 


2.2 Mutagenesis Techniques to Diversify Promoter Strength 


Random mutagenesis is a powerful approach to augment promoter function without 
explicitly requiring extensive knowledge of sequence-to-function mapping. 
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Specifically, because mutagenesis techniques such as error-prone PCR (Ep-PCR) 
indiscriminately target both consensus and non-consensus promoter regions, librar- 
ies with a large dynamic range of promoter function can be easily obtained. For 
instance, error-prone PCR was used to generate a mutant library of the prokaryotic 
P,-A bacteriophage-derived promoter, enabling a 196-fold dynamic range of 
expression in E. coli [28]. The utility of this library was demonstrated by optimizing 
the expression of phosphoenolpyruvate carboxylase (ppc) for biomass yield and 
deoxy-xylulose-P-synthase (dxs) for maximal lycopene production. The impor- 
tance of an expression continuum was highlighted by the fact that optimal dxs 
expression was dependent on strain genetic background. Similar mutagenesis of the 
strong constitutive S$. cerevisiae TEF1 promoter yielded a library exhibiting a 
15-fold dynamic range [28, 29]. Likewise, this library was used to optimize 
glycerol 3-phosphate dehydrogenase (GPDI) expression for glycerol 
overproduction in yeast. 

As an alternative to Ep-PCR, serial deletion of promoter regions has been used to 
modulate expression, especially for mammalian hosts. Initially, serial deletion was 
used as a genetic tool to systematically remove portions of a promoter sequence to 
better understand function [30-32]. As these deletions often tend to dampen 
promoter activity, this approach has recently been used to generate libraries of 
weaker promoters [33, 34]. In this regard, serial deletion has been used to create 
knockdown libraries of glutamine synthetase (GS) expression for the GS-CHO 
expression system [35]. Moreover, serial deletion can also identify promoter var- 
iants that are cell-line specific. For example, the human cytomegalovirus (hCMV) 
promoter was optimized for transgene expression in both CHO-K1 and HEK-293 
cells [36]. This study found that the full-length promoter gave the highest stable 
expression in CHO-K1 cells whereas the addition of the first exon to the minimal 
enhancer and core promoters was optimal for expression in HEK293 cells. 

Although Ep-PCR and serial deletion are effective at creating a large dynamic 
range of promoter strength, these approaches suffer from two major deficiencies: 
(1) higher level expression is hard to achieve and (2) large pools of inactive mutants 
are generated because of aberrant mutagenesis of elements critical for transcription 
[2]. Newer techniques (described in the sections below) are required to gain higher 
expression consistently. To address the second limitation of large inactive pools, 
more targeted approaches that make use of molecular understanding of promoter 
function can be employed. As an example, a saturation mutagenesis approach 
(Fig. la) was used to specifically modulate the sequence between consensus —35 
“TTGACA” and —10 “TATAAT” motifs [37]. As these two motifs are both 
necessary and sufficient for the recruitment of the o”° factor of RNA polymerase 
II (RNAP II) to initiate transcription [38], a randomized linker region was generated 
that resulted in a promoter library with a 400-fold dynamic range in Lactococcus 
lactis [39]. To improve the dynamic range further, a library including mutations of 
the —35 and —10 motifs exhibited another three orders of magnitude in range, thus 
demonstrating the importance of the entire promoter sequence [39]. 

Eukaryotic promoters, although more complex and less rigidly defined than 
prokaryotic counterparts, can be broken down into a core promoter [40, 41] and 
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Fig. 1 Saturation mutagenesis strategies used to diversify promoters and improve understanding 
of promoter design rules. (a) Prokaryotic promoters have a highly constrained architecture with 
consensus —35 and —10 motifs spaced by exactly 17 base pairs for optimal function. (b, c) 
Eukaryotic promoters lack a rigidly-defined consensus architecture. (b) In yeast, promoters can be 
broken down into an upstream activating sequence (UAS) containing transcription factor binding 
sites (TFBSs), such as those for GCR1p (CT-Box) and Raplp (RPG-box), and a core promoter 
which serves to recruit RNA Polymerase II. (c) In mammalian hosts, promoters follow a similar 
general architecture but contain additional consensus motifs such as the initiator element (INR, 
shown above), transcription factor IIB recognition sequence (BRE), motif ten element (MTE), and 
downstream promoter element (DPE) 


upstream enhancer element(s) [42, 43] located 5’ of the core promoter. Efforts to 
engineer these distinct elements have been successful. For example, Jeppsson 
et al. created an ENOJ-based promoter scaffold (Fig. 1b) containing two GCRIp 
TFBSs, two Rap|lp TFBSs, and a TATA box coupled by spacers whose length was 
based on the architecture of native promoters [44, 45]. Randomization of these 
spacer regions afforded 37 synthetic promoters that spanned 3 orders of magnitude 
in strength. The utility of this library was demonstrated for the controlled knock- 
down of ZWF/ expression, resulting in a 16% increase in yeast ethanol production 
from xylose fermentation. Finally, this same approach of creating synthetic pro- 
moter scaffolds followed by saturation mutagenesis has been applied to mammalian 
promoters (Fig. lc) in which mutagenesis of regions between TFBSs in the JeT 
promoter afforded a weakened synthetic promoter library with a tenfold range [46]. 

Collectively, these early mutagenesis techniques demonstrate that utilizing 
native promoters (prokaryotes) or constructing synthetic promoters (eukaryotes) 
followed by randomization of spacer regions can provide a promoter library marked 
by downregulation. Although efforts continue to use these approaches, a greater 
understanding of promoter architecture and high-throughput characterization tech- 
niques have yielded new methods to design promoters rationally with highly 
specific expression characteristics as described in the following sections. 
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3 Rational Construction of Promoters with Desired 
Characteristics 


3.1 Hybrid Promoter Engineering 


Once essential components of promoter architecture are defined, it is possible to 
combine disparate elements in a “hybrid promoter engineering” scheme. Impor- 
tantly, in contrast to Ep-PCR and saturation mutagenesis, the construction of hybrid 
promoters often yields synthetic promoters which are stronger than the core scaf- 
fold [2]. Thus, this technique serves as a potent way to amplify the expression of 
promoters — an important goal of many engineering endeavors. The first instance of 
hybrid promoter engineering involved the fusion of the tp and /ac promoters to 
create the tacl and taclI promoters [47]. Notably, this resulted in promoters that 
were between 7 and 11 times stronger than the derepressed Jac promoter although 
maintaining the same regulation. Similar approaches in E. coli have been utilized to 
generate regulated promoters. For instance, a strong binding site for the FadR 
transcription factor was placed upstream of the strong phage promoters P, and 
Py7 to create a dynamic biosensor-regulator for acyl-CoA conversion to fatty acids 
in E. coli [48]. A similar concept was used to produce a malonyl-CoA responsive 
hybrid promoter that controlled flux from acyl-CoA to malonyl-CoA [49]. However, 
prokaryotic promoters may also be limited by promoter escape after transcript 
initiation, meaning that the addition of redundant hybrid elements is not guaranteed 
to improve transcription and can reduce transcription in some cases [50]. 

Unlike prokaryotic promoters, eukaryotic promoters are largely enhancer- 
limited, meaning that the addition of enhancer elements (by including additional 
binding sites) can both regulate and amplify promoter activity (Fig. 2a) [51]. Com- 
bining previously isolated Upstream Activating Sequences (UASs) from CYC] 
[52, 53], CLB2 (UASc,g) [54], CITI (UASeyr) [55], GALI-10 (UASgaz) [56], 
and TEFI (UASvgp) [51] with core promoters such as GPD (Pepp) [24], TEF1 
(Prep) [4], LEU2 (PLeum) [52], and CYC/ (Peyc) [57] can result in a predictable 
increase in transcriptional activity [51]. Ultimately, the strongest constitutive pro- 
moter in yeast was generated which had mRNA levels 2.5-fold higher than the GPD 
promoter [24]. Hybrid yeast promoters can also be designed for altered regulation. 
For example, linking various elements of UASga, to a constitutive core results in a 
functional, galactose inducible promoter [51]. A similar approach has been 
conducted with regulated regions of the ARO9 UAS [58]. Collectively, these 
approaches resulted in a library of galactose-inducible promoters with a 40-fold 
range in induced expression strength, and a tryptophan-inducible promoter with a 
29-fold range in induced expression strength. This hybrid promoter approach has 
been extended to non-conventional yeasts such as the host Yarrowia lipolytica. For 
example, hybrid engineering on the LEU2 core promoter resulted in a constitutive 
promoter library with 400-fold range in expression [49]. Most importantly, this 
work demonstrated the generalizability of the hybrid promoter approach to multiple 
core promoters and alternative UAS elements [59]. Such strong promoters were 
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Fig. 2 Promoter engineering strategies. (a) Hybrid promoter engineering uses combinations of 
sequence motifs to modify expression and regulation. (b) Synthetic promoter scaffolds may be 
constructed based on native promoters with desired characteristics. These scaffolds can then be 
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used in the rewiring of Y. lipolytica, in which constitutive overexpression of DGA/ 
using the UAS1B,.-TEF/ hybrid promoter (among other genetic changes) resulted 
in a 60-fold improvement in lipogenesis [60]. 

Finally, the hybrid promoter approach has been further generalized to mamma- 
lian systems. For instance, the binding site of repressor PDX1 in the hCMV 
promoter was removed, enhancing expression fourfold in transient luciferase 
experiments [61]. The traditional additive hybrid approach has also been general- 
ized to mammalian hosts to increase expression [62], improve transgene expression 
in specific hosts [63, 64], and impart novel regulation on promoters. As an example, 
a strong, cold-inducible promoter was created by combining a mild-cold responsive 
enhancer (MCRE) to the hCMV promoter [65]. Using this promoter and shifting 
temperature from 37°C to 32°C afforded sixfold higher erythropoietin production. 
Collectively, these results indicate that the hybrid promoter approaches are useful 
in both increasing net expression and imparting unique regulation. 


3.2 Synthetic Promoter Scaffolds and Libraries 


More recently, efforts have been made to establish synthetic and/or orthogonal 
[66, 67] promoters. Certainly bacterial systems can take advantage of the T7 RNA 
polymerase system [68] to generate short, synthetic, and orthogonal promoters for 
usage in logic gates [69-71]. However, the diversity of synthetic prokaryotic pro- 
moters is limited by the strict consensus promoter architecture not found in eukary- 
otes. To create a library of orthogonal core promoters in S. cerevisiae, native 
promoters were screened over a wide range of growth conditions to find a promoter 
scaffold that would exhibit the least amount of natural regulation [67]. The resulting 
candidate promoter, PFY/ (Ppry,), was then de-constructed to produce a minimal 
promoter scaffold (Fig. 2b) containing the ~100-bp core promoter, a Reb1p binding 
site, and a poly-dT element that maintained nucleosome depletion and constant 
DNA bending for constitutive RNA polymerase II access. By randomizing the 
spacer regions within this core promoter, a library of 36 minimally-regulated pro- 
moters with a 10-fold dynamic range in expression was created. This same meth- 
odology has been generalized to other organisms including Pichia pastoris, where 
four natively regulated promoters were sequence aligned to create a set of minimal 


Fig. 2 (continued) diversified using saturation mutagenesis and modified via hybrid promoter 
engineering. (c) Minimal synthetic core and enhancer elements may be selected using randomi- 
zation followed by FACS. (d) Promoter elements have been fully characterized via high- 
throughput oligo library synthesis followed by FACS sorting into different expression bins. (e) 
Expression can be tuned by altering nucleosome occupancy using a nucleosome prediction model 
or via addition of nucleosome-disfavoring poly (dA:dT) tracts. Chromatin regulators (CRs) can 
program a diverse range of transcriptional logic when targeted to synthetic promoters, thus 
creating more efficient synthetic circuits 
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core promoters from which sequence elements were transferred to modify the 
native AOX/ promoter [72]. This same approach has been applied to human liver 
cells where a synthetic promoter scaffold with enhanced TF binding was created via 
the alignment of the hCMV and HEF /a promoters [64]. 

In an effort to generate more minimal, synthetic promoters using a library-based 
approach, Redden and Alper [73] developed an S. cerevisiae minimal core pro- 
moter scaffold (Fig. 2c) by dissecting both the core element and the UAS element 
and identifying functional, minimal units using a library-based approach involving 
FACS analysis and a series of robustness tests. Ultimately, a series of nine generic 
core elements were isolated which have limited homology to the genome. The same 
methodical workflow was used to isolate six synthetic 10-bp UAS sequences that 
activated these synthetic core promoters. Finally, these elements were combined to 
generate a minimal promoter with 70% the activity of GPD with an 80% reduction 
in size. Importantly, these promoters represent a minimal scaffold with highly 
defined consensus regions similar to those of prokaryotic promoters and thus 
these elements may be further rationally engineered for desired characteristics. 
Finally, in HeLa cells, synthetic 100-bp enhancers were created via construction of 
a library containing tandem repeats of random, micro-array printed 10-bp oligo- 
nucleotides [74]. This approach resulted in an enhancer with twice the strength of 
the hCMV enhancer. Thus, rationally constructing purely synthetic libraries can 
result in novel promoters with prescribed function across multiple hosts. 


4 Sequence-Level Prediction and Specification 
of Promoters 


Most of the methods described above rely heavily on repeated iterations of the 
synthetic biology design-build-test cycle [75, 76]. In contrast, the ability to specify 
promoter function at the DNA level would rapidly accelerate the field of synthetic 
biology by reducing the number of design cycles. This section describes many of 
the efforts that have been made toward this end. 


4.1 Promoter Characterization and Standardization 


Promoters, composed of a vast array of distinct regulatory elements, behave as a 
system that integrates an input from the host to produce an output: gene expression. 
As high-throughput oligo synthesis [77] and quantification of DNA, mRNA, and 
protein levels have improved, large combinatorial libraries may be generated to 
measure promoter performance across a wide range of contexts (Fig. 2d). For 
instance, in prokaryotes, the Ribosome Binding Site (RBS) controls the binding 
of the ribosome to the mRNA transcript, thus regulating gene expression at the 
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translational level whereas the promoter regulates expression at the transcriptional 
level. The independent function of these two regulatory elements has been thor- 
oughly characterized and modeled via the construction of a library containing 
combinations of 114 promoters and 111 RBSs [78]. Although the model could 
explain 96% of RNA levels, its prediction of 82% of protein levels demonstrates the 
complex regulation of prokaryotic gene expression at the translational level. Thus, 
it is important to consider RBS performance when designing expression cassettes in 
pathways. 

Eukaryotic transcription is regulated by a complex “program” of TF binding and 
RNAP II recruitment, and thus underlying “design rules” can be extracted that 
determine how the orientation, copy number, and context of TFBSs affect tran- 
scription. To parse these design rules, Sharon et al. [79] created a combinatorial 
library varying these parameters for 75 transcription factors. Fluorescence- 
activated cell sorting (FACS) coupled with high-throughput sequencing of 6,500 
barcoded promoters generated a large dataset that uncovered regulatory design 
rules for TFs. For instance, in promoters that contained a Gcn4p binding site, 
expression and binding site location were related via a periodic function. Using a 
similar high-throughput characterization technique in mouse liver cells, it was 
possible to rapidly screen thousands of rationally designed enhancer haplotype 
variants [80]. This study found that enhancers are highly robust to single nucleotide 
variation (SNV), but that combinations of SNVs have an additive negative effect on 
function. This study also determined novel expression-enhancing motifs and char- 
acterized predicted TFBSs, thus laying the foundation for future enhancer design 
rules. In mammalian hosts, a similar predictive model has been used to identify 
K-mers that denote enhancers recognized by certain TFs [81, 82]. This model can 
be trained on CHIP-seq data [83] to predict enhancers throughout the genome. 

Whereas TFBSs with a well-characterized function may be added to tune 
expression rationally, sequence-function mapping for core promoters is less under- 
stood. The core promoter sequence determines how RNAP II binds in the TATA 
region, forms the pre-initiation complex to unwind the DNA directly downstream, 
scans for a TSS, and initiates transcription [84-86]. Moving towards rational 
design, 859 native S. cerevisiae promoters were characterized using flow cytometry 
to generate a model relating maximal expression to short oligo motifs (K-mers) 
which impact these steps [86]. Although this model only accounted for 25% of the 
variance in an aggregate test promoter set, it nonetheless mapped expression- 
enhancing and repressing characteristics to short motifs in the core promoter to 
allow prediction of novel synthetic promoters. These results were improved upon 
via construction and high-throughput characterization of 13,000 specifically 
designed synthetic core promoters [87], leading to a model relating expression to 
the presence and orientation of consensus core promoter regions. However, despite 
analysis of thousands of systematically designed core promoters, the design rules 
for sequence level specification of core promoter activity are much less understood 
than those for UAS manipulation. 


Promoter and Terminator Discovery and Engineering 31 
4.2 Thermodynamic Modeling and Prediction of Promoters 


To fully expedite the synthetic biology design cycle, it is desirable to develop 
methods to design entire promoters de novo for predictable expression. In pro- 
karyotes, thermodynamic models of ribosome interaction with mRNA secondary 
structure have been constructed to calculate the proportion of bound RBS-mRNA 
complexes, and thus translation rate [88, 89]. A thermodynamics-based RBS 
calculator was able to predict expression levels within a factor of 2.3 over an 
expression range of five orders of magnitude. Most importantly, this RBS calculator 
takes into account variations in translation rate depending on the genetic context of 
the RBS, thus allowing a “forward engineering” approach for novel applications. 

Although eukaryotic transcriptional regulation involves countless protein factor 
binding events prior to transcription initiation, it is nonetheless possible to thermo- 
dynamically model individual steps as a surrogate for transcription initiation rate. A 
thermodynamic model incorporating both TF-DNA and TF-TF interactions was 
trained upon a promoter library containing different TFBS combinations using 
“effective TF concentration” as a floating parameter to fit the data [90]. Overall, 
the model predicted 56% of the variance in expression across a wide variety of 
TFBS arrangements, thus laying a foundation for de novo design of regulatory logic 
at the DNA sequence level. 

To generalize this model further, other events in transcription initiation have 
been considered. Thermodynamic modeling of the TATA-TATA-binding protein 
(TBP) complex formed as a first step in the recruitment of RNAP II [91] and 
re-design of promoters with different consensus TATA boxes created a promoter 
library which predictably scaled with the thermodynamic affinity of TBP to each 
TATA Box [92]. Incorporating the thermodynamic model for the TBP-TATA 
complex with the previously developed model for TF-RNA Polymerase II and 
TF-TF binding [90] explained 75% of variance in promoter expression across a 
wide variety of genetic contexts. These examples demonstrate the utility of ther- 
modynamically modeling transcription initiation steps as a means to predict expres- 
sion. Since discovering promoters is highly important for uncharacterized 
mammalian hosts, thermodynamic sequence-level approaches have been used to 
predict novel promoters based on DNA structural properties such as duplex stability 
and bendability [93, 94]. In addition, mammalian promoter regions have been 
modeled at the sequence level using an “alpha score,” which describes the likeli- 
hood that a genomic region contains a promoter based on its nucleotide compo- 
sition. Remodeling the X-linked gene cancer/testis antigen 1A promoter to have 
twice the alpha score improved expression in a non-quantitative manner 
[95]. Although predictive of high expression, these techniques are limited as they 
cannot design promoters de novo with prescribed expression. Nevertheless, they 
demonstrate the potential to use heuristic models for the design and prediction of 
DNA function. 
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4.3 Prediction and Rational Modulation of Promoter 
Nucleosome Occupancy 


In eukaryotes, the secondary structure of promoter DNA wound around nucleo- 
somes controls access to the transcription machinery [96]. As a result, the rational 
design of novel promoters must consider how primary sequence contributes to 
DNA secondary structure. Nucleosome occupancy at promoters strongly regulates 
gene expression because nucleosome binding can occlude TFBSs and RNAP II 
recruitment to the core promoter [97]. Accordingly, rational addition of a tunable 
nucleosome-disfavoring poly(dA:dT) element [91, 98, 99] upstream of the natural 
Gcn4p binding site in a synthetic His3-based promoter library afforded predictable 
control over nucleosome occupancy and thus expression [100]. Similarly, mutation 
of CpG islands known to be prone to methylation and silencing by histones 
eliminated promoter silencing during long-term transgene expression in embryonic 
stem cells [101]. Thus, nucleosome-disfavoring sequences may be considered part 
of the rational eukaryotic promoter engineering toolbox along with the addition of 
hybrid enhancers (Fig. 2e). 

To map nucleosome occupancy to primary sequence for predictive engineering 
of promoters, a Hidden Markov Model (HMM) was trained on a genome-wide 
nucleosome map [102]. This model was utilized to investigate nucleosome occu- 
pancy of the previously mentioned TEF/ promoter library, demonstrating that 
expression correlated inversely with predicted cumulative nucleosome occupancy 
in a very robust manner. To create a predictive model, a greedy algorithm was 
developed which allowed re-design of native promoters for up to 16-fold greater 
strength [103]. Furthermore, this approach was used for the successful de novo 
design of synthetic yeast promoters. Importantly, sequence-level prediction of 
nucleosome occupancy affords a predictive method to optimize native promoters 
fully regardless of genetic context. As a result, future efforts in this area must 
consider the precise control of nucleosome occupancy to modulate expression. 


4.4 Design of Synthetic Promoters with Controlled 
Chromatin Environment 


Moving forward from nucleosome models, the context of eukaryotic DNA is 
important in considering promoter function. Specifically, eukaryotic DNA is 
wound around histone octamers in 147 base pair increments and packaged together 
tightly to create the “bead-on-a-string” backbone of the chromatin [104]. This 
structure is not composed randomly; in fact, the structure of chromatin surrounding 
genes has a direct impact on their regulation [105-111]. Thus, any endeavor to 
engineer promoters rationally as synthetic biology “parts” that exhibit defined 
functions in any genetic context must take into account the chromatin environment 
of the promoter. 
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The first step towards any rational bottom-up synthetic biology engineering 
approach is to parse design rules from the native system. To create design rules 
for chromatin-based control, a combinatorial library of zinc finger-based synthetic 
transcription factors was created with specific yeast chromatin regulators (CRs) 
tethered as the activation domain [112]. These CRs impact gene expression by 
regulating PIC formation, remodeling and assembly of nucleosomes, chromatin 
accessibility via histone modification, and transcriptional elongation. From this 
library screening approach, many different classes of CRs were delineated: acti- 
vators and repressors, synergistic regulators, spatially encoded regulators that could 
repress transcription from a non-canonical position downstream of genes, and CRs 
that could activate or repress multiple genes simultaneously over a long range of 
genomic space. These minimal chromatin-based components can thus act as syn- 
thetic “parts” to create a diverse array of transcriptional logic and predictably tune 
expression by altering chromatin state. These initial efforts demonstrate the 
first work towards considering greater genetic context for promoters. 

In closing, promoter discovery and characterization has progressed from genome 
mining to random mutagenesis to combinatorial and rational design. In some of 
these later cases, the use of computational models has been able to speed the design- 
build-test cycle. Although limitations still exist with respect to inducible promoters, 
pure synthetic design, and maximal expression levels, the field has progressed 
rapidly in recent years. 


5 Terminator Discovery and Characterization 


In addition to promoters, terminators serve as an important control point when 
tuning expression in circuits and pathways [113, 114]. Unlike promoters, terminator 
cataloguing has not been as extensive until recently. In fact, most commonly used 
terminators have been relics from past experiments and are not often the most 
efficient. As an example, commonly used terminators such as the native bacterio- 
phage T7 terminator exhibit low termination efficiencies, meaning that transcrip- 
tional flux continues through the expression cassette and affects the regulation of 
downstream genes and limits polymerase recycling [113-115]. Furthermore, the 
collection of terminators available to researchers has traditionally been much 
smaller in breadth than promoters [116], thus limiting large-scale pathways and 
circuits because of the fear of genetic instability via homologous recombination 
[117, 118]. Terminators also serve as a control point to tune expression in eukary- 
otes via the stability of the 3’ end of the mRNA transcript [119-121]. Thus, the base 
of commonly used terminators must be diversified to meet pathway specifications 
via both discovery and engineering techniques. We highlight various approaches 
from terminator mining to synthetic design and models in the following sections. 
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5.1 Native Terminator Mining 


To diversify initially from the commonly used terminator library in E. coli, an 
extensive library of 582 natural and synthetic terminators [122, 123] was 
constructed and analyzed for its termination efficiency [124]. To enable further 
terminator engineering, the study also delineated terminator design rules based on a 
mechanism where RNAP stalls at the U:A tract, allowing an RNA hairpin to form 
within the RNA exit channel and terminating transcription. It was shown that the 
composition of the terminator U-tract effectively controls polymerase dissociation 
and can thus be rationally designed to impact terminator strength. This work served 
as one of the more exhaustive studies for bacteria to determine alternative termi- 
nators for synthetic constructs. 

In contrast to prokaryotic intrinsic termination, eukaryotic mRNA transcript 
stability is regulated by recruited protein factors such as the cleavage and 
polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CstF) 
[125]. Thus, terminators must be characterized not only by their termination 
efficiency but also by their impact on mRNA and protein levels. Yamanishi et al. 
undertook the first genome-scale flow cytometry characterization of yeast termi- 
nators, determining that the majority of terminators enabling higher expression from 
a synthetic construct came from ribosomal protein genes [120]. A separate, high- 
capacity terminator library was constructed by selecting a subset of terminators 
originating from genes shown to have higher mRNA half-lives [121]. Character- 
ization of this library established a direct relationship between terminator strength 
and mRNA half-life, thus laying the groundwork for terminator design rules. 
In addition, the utility of these alternative terminators was proven by improved 
pathway flux with similar or lower promoter strength as those originally paired with 
a “traditional” terminator. Thus, terminators clearly serve as an important synthetic 
part that must be rationally specified to tune expression for metabolic engineering 
applications. 


6 Rational Construction of Terminators with Desired 
Characteristics 


6.1 Hybrid Terminator Engineering 


Similar to promoters, the hybrid engineering approach has yielded synthetic termi- 
nators with enhanced efficiencies. Multiple combinations of both native and syn- 
thetic termination signals were used to enhance the termination efficiency of the T7 
terminator while retaining its orthogonality [126]. However, this hybrid approach 
faces limitations in eukaryotes because termination is a highly concerted process 
regulated by multiple disparate elements (Fig. 3a). 
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Fig. 3. Evolution of terminators in S. cerevisiae. (a) Unlike promoters, native terminators consist 
of many defined consensus motifs. (b) A minimal terminator scaffold was created by spacing these 
terminator motifs 10 bp apart. (c) This scaffold was engineered by rationally modifying the linkers 
between consensus motifs, adding upstream and downstream elements, and changing the length 
and sequence of consensus motifs. Licensing: Reprinted with permission from Curran KA, Morse 
NJ, Markham KA et al. (2015) Short, Synthetic Terminators for Improved Heterologous Gene 
Expression in Yeast. ACS Synth. Biol. 2015, 4, 824-832. doi:10.1021/sb5003357. Copyright 
© 2015 American Chemical Society 


6.2 Synthetic Terminator Scaffolds and Libraries 


To overcome the limitations of hybrid terminator engineering in yeast, a synthetic 
minimal terminator scaffold (TGu.) was constructed by stringing together defined 
consensus efficiency, positioning, and poly-adenylation elements which cooperate 
in the cleavage and 3’ polyadenylation of the mRNA transcript (Fig. 3b) [127]. This 
minimal scaffold was both diversified and enhanced using modified consensus 
termination elements and mRNA stability elements [128] to produce a library of 
rationally designed synthetic terminators (Fig. 3c) which were functional in multi- 
ple hosts and improved CAD/ expression for itaconic acid production [129]. Impor- 
tantly, this technique allowed delineation of design rules based on consensus 
element identity and spacing, enabling potential rational design of synthetic termi- 
nators. These resulting terminators were much shorter in size than native termina- 
tors with the additional benefit of enhanced mRNA stability and increased protein 
production. Thus, in a similar fashion as described with promoters above, once a 
fundamental understanding of molecular function is obtained, synthetic part design 
can proceed. 
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7 Sequence-Level Prediction and Specification 
of Terminators 


Although the previously described methods of synthetic terminator design allow 
rational diversification of the terminator library, they are nonetheless limited by the 
natural sequence space. Pure de novo design of terminators requires a fundamental 
understanding of the constraints underlying terminator function. Very early studies 
have begun to elucidate underlying design principles for terminators; however, this 
area is lagging behind the progress made with promoters as described above. 


7.1 Terminator Characterization and Standardization 


To this end, high-throughput studies have been carried out to measure quanti- 
tatively the performance of terminators and determine predictive sequence features 
for design in both prokaryotes and eukaryotes. For instance, systematic variation of 
terminator U-tract and hairpin stem-loop sequences in the aforementioned EF. coli 
terminator library [124] afforded optimal expression-enhancing consensus 
sequences for rational construction of synthetic terminators. 

Both native and synthetic terminator libraries have been constructed and char- 
acterized to tease apart the functions of different terminator motifs [130] in regu- 
lating mRNA abundance in yeast [131, 132]. Characterization of these libraries 
showed that the AU-rich efficiency element upstream of the poly(A) site plays a 
major role in 3’ end processing and transcription termination. In addition, termi- 
nators were broken down into mono- and di-nucleotide K-mers, leading to identifi- 
cation of dA:dT elements as a major determinant in terminator strength. From these 
studies, it appears that terminators can be broken down into a collection of 
tunable elements for rational design. 


7.2 Thermodynamic Modeling and Prediction 
of Terminators 


To generate a finer continuum of terminator function, it has become necessary to 
engineer entirely synthetic terminator sequences based on known design rules and 
thermodynamic prediction. In prokaryotes, multiple biophysical models have been 
developed to predict terminator strength based on elementary steps in termination, 
including U:A hybrid formation, hairpin formation, and mRNA transcript dissoci- 
ation [122, 133, 134]. Training one of these models on a set of natural and synthetic 
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terminators over a large dynamic range in termination efficiencies afforded a linear 
sequence-function model with a high coefficient of determination (R? = 0.81) [134]. 
In S. cerevisiae, however, terminator function is much less predictable based 
simply on distinct sequence elements whose function is determined by biophysical 
models. In fact, characterization of the aforementioned rationally designed syn- 
thetic library [129] demonstrated that consensus termination motifs were not 
entirely additive. This suggests there is a fundamental code underlying termination 
in yeast which remains to be uncovered before thermodynamic prediction becomes 
feasible. However, with a more rigidly defined architecture than promoters, yeast 
terminators are highly amenable to rational engineering for desired characteristics. 
Thus, creating fundamental models to describe eukaryotic termination and half-life 
stabilization are required to advance the field of terminator engineering. 


8 Future Directions in Promoter and Terminator 
Engineering 


Improved promoters and terminators help minimize the length of the design cycle. 
Optimal design of these elements must meet three criteria: robustness, orthogonal- 
ity, and predictable tunability. Promoters and terminators must be robust in that 
they function consistently regardless of genetic background, genetic context, and 
cellular environment [135]. In this regard, unexpected deviation from desired 
promoter or terminator function is a severe hindrance to the rapid development of 
circuits and pathways leading to multiple iterations of the design cycle. To improve 
robustness, efforts have been made to create synthetic promoter scaffolds based on 
highly constitutive promoters which function consistently across many different 
cellular environments. However, to date, few significant efforts have been made to 
engineer eukaryotic promoters that are robust to differing genetic contexts. These 
efforts are also complicated by the fact that eukaryotic promoters are highly 
regulated by the chromatin environment in which they are placed. It is thus 
imperative to develop design rules that govern promoter and terminator chromatin 
environment to predict and control these factors for optimal gene expression. The 
promise of purely orthogonal elements can bypass some of the robustness issues as 
these promoters and terminators seem to function more ubiquitously. Overall, many 
strides have been made in the past 5 years to provide novel expression capabilities 
to promoters and terminators. However, because of the regulatory complexity of 
microorganism hosts, new techniques must be developed to predict and design 
promoters and terminators for desired function. Nevertheless, these new synthetic 
parts have greatly improved the ability to engineer strains for metabolic engineering 
and synthetic biology applications. 
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Engineering Biomolecular Switches 
for Dynamic Metabolic Control 


Cheng-Wei Ma, Li-Bang Zhou, and An-Ping Zeng 


Abstract Living organisms have been exploited as production hosts for a large 
variety of compounds. To improve the efficiency of bioproduction, metabolic 
pathways in an organism are usually manipulated by various genetic modifications. 
However, bottlenecks during the conversion of substrate to a desired product may 
result from cellular regulations at different levels. Dynamic regulation of metabolic 
pathways according to the need of cultivation process is therefore essential for 
developing effective bioprocesses, but represents a major challenge in metabolic 
engineering and synthetic biology. To this end, switchable biomolecules which can 
sense the intracellular concentrations of metabolites with different response types 
and dynamic ranges are of great interest. This chapter summarizes recent progress 
in the development of biomolecular switches and their applications for improve- 
ment of bioproduction via dynamic control of metabolic fluxes. Further studies of 
bioswitches and their applications in industrial strain development are also 
discussed. 


Keywords Biomolecular engineering, Bioswitches, Cellular regulation, Dynamic 
metabolic control, Strain development 
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1 Introduction 


Industrial biotechnology focuses on the development of living organisms, espe- 
cially microorganisms, as production hosts for a large variety of compounds with 
potential usages in the chemical, food, pharmaceutical, agriculture, and health care 
industries. With the occurrence of synthetic biology, compounds that previously 
could not be synthesized by natural microorganisms can now be produced by 
combining metabolic pathways from different organisms into a single host [1— 
3]. However, although microorganisms have been developed and used to produce 
various chemicals and materials, their production efficiency is often not high 
enough to reduce the production cost to a competitive level with the traditional 
fossil-based routes. Approaches that can improve the production efficiency of 
organisms are therefore of great importance. As a widely used strategy, metabolic 
engineering has served as a powerful approach to overcome bottlenecks in the 
bioproduction processes with microbial hosts such as Escherichia coli and Saccha- 
romyces cerevisiae. 

Metabolic engineering has so far been successfully applied to construct efficient 
bioproduction strains by manipulating metabolic pathways in an organism. Gener- 
ally speaking, strategies traditionally used in metabolic engineering to generate 
high producers [4] are (1) enhancement of precursor supply and the transport of 
both substrates and products by gene overexpression, (2) removal of enzymes 
involved in competing pathways and enzymes that may cause degradation of 
products by gene knockout, and (3) down-regulated gene expression of enzymes 
taking part in competing but essential pathways via promoter engineering. These 
strategies are “static” because of their permanent and unchangeable modifications 
at the genetic level [5]. As a consequence, these modifications cause unwanted 
burdens on the organisms when the cellular and environmental conditions are 
changed. In particular, the “static” strategies are not efficient to deal with compet- 
ing pathways which are essential for the growth of cells, because there is no definite 
ratio between the production pathways and the competing pathways. 

On the other hand, bottlenecks during the conversion of substrate to a desired 
product in microorganisms may result from cellular regulations at different levels. 
Improvement of production strains by manipulating the dynamics of metabolic 
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pathways and fluxes to account for changing cellular and environmental conditions 
is more desirable and challenging, whereas this cannot be realized using static 
control methods as found in most of the current metabolic engineering praxis. 
Moreover, synthesis of a desired product always involves multiple metabolic 
pathways and these pathways often exhibit specific properties suitable for produc- 
tion under distinct conditions and host organisms [6]. Concerted dynamic control 
thus arises when synergy is required among different metabolic pathways, 1.e., 
regarding reducing equivalent demand, cofactor preferences, and intermediate 
utilization, etc. [7] Thus, dynamic metabolic control of related pathways, as a 
complementary strategy used in current metabolic engineering, may lead to an 
increased productivity and yield. This chapter summarizes recent progress in the 
development of biomolecular switches and their applications for improvement of 
bioproduction via dynamic metabolic control. 


2 Natural Bioswitches 


For dynamic control of metabolic fluxes, cellular entities or devices able to regulate 
metabolic activities by response to input signals are required and biomolecules that 
can fulfill such a requirement are called bioswitches. Because organisms are 
exposed to a variety of conditions in their environment, such as varying tempera- 
tures, availability of different nutrients, exposure to toxins, and products of their 
own metabolism, they need to be able to adjust rapidly to the changing conditions. 
Various bioswitches have thus been evolved in nature and discovered by 
researchers. They can be proteins that function in signaling pathways or participate 
in the transcription process, or allosteric enzymes that catalyze the metabolic 
reactions. They can also be non-coding RNAs such as riboswitches. These natural 
bioswitches can respond to various input signals and regulations can occur at 
different cellular levels (Table 1). Specifically, environmental signals are trans- 
duced by two-component regulatory systems and corresponding gene expressions 
are modulated at DNA level, whereas intracellular metabolites can be sensed by 
one-component regulatory systems at transcription level, by RNA-based 
bioswitches at either transcription or translation level, or even at protein level via 
allosteric enzymes. 

For bioproduction of a particular compound encountered in industrial biotech- 
nology, bioswitches able to bind at least two molecules are of particular interest. 
For these bioswitches, the function of the downstream target can be modulated by 
the binding of small molecules which are usually the intermediates involved in the 
biosynthetic pathways or the target product. In this chapter, only recent studies on 
bioswitches sensitive to small molecules and used for dynamic metabolic control 
are covered (Fig. 1). For discussion of biosensors usually coupled with reporter 
genes such as green fluorescent protein (GFP) for strain screening, readers are 
referred to other recent reviews (e.g., [8, 9]). 
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Table 1 Natural bioswitches that regulate cellular metabolism 


Output 
Input signals Bioswitches regulations 
¢ Light, temperature, pressure, ¢ Signaling proteins (e.g., two-component | ¢ Transcription 
et al. regulatory system) ¢ Translation 
¢ Extracellular molecules (e.g., | ¢ Transcription factors (e.g., * Metabolic 
nutrients) one-component regulatory system) reaction 
* Intracellular molecules (e.g., « RNA (e.g., riboswitches) 
metabolites) ¢ Allosteric enzymes 


2.1 DNA-Level Bioswitches 


Regulation at DNA level is mainly mediated via transcription factors. Besides RNA 
polymerase, transcription factors include a wide number of proteins that play roles 
in initiating and regulating the transcription of genes. One distinct feature of 
transcription factors from other proteins is that they have DNA-binding domains 
and so are able to bind to a specific sequence of DNA. Regulation of transcription is 
the most common form for the control of gene expression, which allows for unique 
expression of each gene according to changing environments. By taking advantage 
of their DNA binding specificity, transcription factors can be employed as 
bioswitches to regulate metabolic fluxes once the expression of target enzymes 
are under control of a unique transcription factor and this strategy is facilitated by 
the diversity of transcription factors existing in nature (Table 2). 

As a key mechanism to link environmental signals to cellular responses, 
two-component regulatory systems (Fig. la) enable organisms to sense, respond, 
and adapt to a wide range of environmental factors including nutrients, cellular 
redox state, changes in osmolarity, quorum signals, antibiotics, pH, and even 
physical factors such as light and temperature [10]. Although only a few 
two-component systems have been identified in eukaryotic organisms, they are 
widely distributed in prokaryotes. Some bacteria can contain as many as 200 - 
two-component systems to transfer different input signals to adequate outputs 
[11]. Different from two-component regulatory systems, a type of bacterial tran- 
scription regulators known as one-component regulatory systems consists of pro- 
teins that serve both as metabolite sensors and transcription regulators because they 
include both an “input domain” and an “output domain” in their structure (Fig. 1b). 
Because control of gene expression via one-component systems is more common 
and more diverse in bacteria and archaea than two-component systems of transcrip- 
tion regulation, it is speculated that one-component systems are evolved before 
two-component systems and may even have served as their evolutionary precursors 
[12]. The mechanisms of many families of one-component transcription regulation 
systems have been characterized on a structural level [13, 14]. One-component 
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Fig. 1 Regulation of metabolic fluxes by bioswitches at different molecular levels. (a) 
Two-component regulatory system. H histidine; D aspartate acid; P phosphorization. (b) 
One-component regulatory system. (c) Riboswitches that function by stimulating transcription 
termination. (d) Riboswitches that function by interrupting translation initiation. (e) Allosteric 
enzymes with feedforward activation. (f) Allosteric enzymes with feedback inhibition 


transcription regulation systems that have been identified in sequenced genomes 
can be assembled into families based on their sequence similarity, predominantly in 
their DNA-binding helix-turn-helix (HTH) domain. These families often include 
regulators with significant similarity in DNA-binding but divergent metabolite- 
sensing domains, and they tend to control expression of genes involved in related 
functions. 
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Table 2, Ligand-sensitive DNA regulators 


Transcription 
factor Source Ligand Refs. 
PcaU Acinetobacter 3,4-Dihydroxybenzoate [80] 
FapR B. subtilis Malonyl-CoA [67, 81] 
QdoR B. subtilis Kaempferol, quercetin [82] 
CysR C. glutamicum O-Acetyl (homo-) serine [83] 
Lrp C. glutamicum L-Valine, L-leucine, L-isoleucine, | [84-86] 
L-methionine 
LysG C. glutamicum L-Lysine, L-arginine, L-histidine | [87, 88] 
AraC E. coli Lycopene [89] 
DcuS E. coli Succinate [90] 
DcuR E. coli Succinate [90] 
FadR E. coli Fatty acyl-CoA [65] 
Lacl E. coli IPTG, lactose [91] 
SoxR E. coli NADPH [92] 
TyrR E. coli L-Tyrosine [89] 
FdeR Herbaspirillum Naringenin [82] 
seropedicae 
BenR P. putida Benzoate [93] 
NahR P. putida Benzoic acids [94] 
PcaR P. putida B-Ketoadipate [90] 
BmoR Thauera butanivorans 1-Butanol [90] 


The most abundant type of transcriptional regulator in the prokaryotic kingdom 
is the LysR family of transcriptional regulators [15]. The conserved overall struc- 
ture includes an N-terminal DNA-binding domain, linked to two C-terminal effec- 
tor-binding domains which are made of two «/f subdomains connected by two short 
polypeptide fragments. These connecting fragments form a hinge or cleft which 
accommodates the small molecule effector. Despite considerable conservation both 
structurally and functionally, LysR-type transcriptional regulators regulate a 
diverse set of genes, including primary and secondary metabolisms. The effector 
molecules have been identified for some of the LysR-type regulators. They include 
substrates, products, and intermediates of pathways under their control and related 
metabolites [16]. In particular, members of the LysR family play diverse roles in 
controlling amino acid biosynthesis. In addition to LysR, which acts as an activator 
in lysine biosynthesis, biosynthesis of glutamate can be activated by GltC. Biosyn- 
thesis of isoleucine/valine are under the control of IlvR and IIvY. The transport of 
arginine is regulated by ArgP. Besides these local regulators, some transcriptional 
regulators can function globally. For example, MetR is able to regulate biosynthe- 
sis/transport of both cysteine and methionine. 

For discovery of transcription factor binding sites, chromatin immunoprecipita- 
tion (ChIP) is an important experimental technique. This can be used for studying 
interactions between specific proteins and DNA in the cell and determining their 
localization on a specific genomic locus. In recent years, the combination of ChIP 


Engineering Biomolecular Switches for Dynamic Metabolic Control 51 


with the second generation DNA-sequencing technology (ChIP-seq) allows precise 
genomic functional assay, especially in genome-wide mapping of transcription 
factor binding sites, the revelation of underlying molecular mechanisms of differ- 
ential gene regulation governed by specific transcription factors, and the identifi- 
cation of epigenetic marks [17]. For the analysis of ChIP-seq data, a novel approach 
called ChIPModule has been developed to discover systematically transcription 
factors and their cofactors [18]. Given a ChIP-seq dataset and the binding patterns 
of a large number of transcription factors, ChIPModule can efficiently identify 
groups of transcription factors whose binding sites significantly co-occur in the 
ChIP-seq peak regions. By testing ChIPModule on simulated data and experimental 
data, it was shown that ChIPModule can not only identify known cofactors of 
transcription factors but also predict new cofactors. Although the ChIP-seq exper- 
iments provide an unprecedented opportunity to discover binding motifs, which is 
important for the study of gene transcriptional regulation, de novo motif discovery 
methods often neglect underrepresented motifs in ChIP-seq peak regions. To 
address this issue, a novel approach called SIOMICS has been developed to 
discover motifs from ChIP-seq data [19, 20]. Compared with other methods for 
motif discovery, SIOMICS showed advantages in terms of speed, the number of 
known cofactor motifs predicted in experimental data sets, and the number of false 
motifs predicted in random data sets. 


2.2 RNA-Level Bioswitches 


In recent years it has become evident that posttranscriptional regulation mediated 
by RNA regulators is critical to many cellular processes in both prokaryotic and 
eukaryotic kingdoms [21]. Cells are able to use these small molecules to respond 
rapidly to various environmental signals and stresses. Among the RNA regulators, 
riboswitches are attracting great interest from researchers [22]. A riboswitch is a 
regulatory segment of a messenger RNA molecule which binds a small molecule, 
resulting in a change in production of the proteins encoded by the mRNA in 
response to the concentration of its effector molecule [23, 24]. 

Riboswitches are composed of an aptamer domain and an expression platform. 
Aptamers are short nucleic acid sequences capable of binding specific ligands with 
high affinity and specificity. Upon binding of a small molecule, the structure of the 
expression platform changes in response to the alteration of the aptamer. Because of 
the diversity of expression platforms, the same type of riboswitches may be capable 
of regulating gene expression at different levels. Riboswitch control of transcrip- 
tional termination exists most commonly in Gram-positive bacteria (Fig. Ic) 
whereas riboswitch control of translational initiation is often found in most 
Gram-negative bacteria (Fig. 1d). A variety of metabolite-binding riboswitches 
has been discovered and characterized [23, 25]. As listed in Table 3, the ligands 
range in complexity from metal ions to enzymatic cofactors such as flavin mono- 
nucleotide (FMN), S-adenosyl methionine (SAM), and coenzyme Bj>. 
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Table 3 Classification and characterization of riboswitches 

Riboswitch Ligand Length* Affinity” Refs. 
Enzymatic cofactor 

TPP TPP 100-120 100 nM [95, 96] 
FMN FMN 120-140 5 nM [97] 

Bi AdoCbl 200-220 300 nM [98, 99] 
SAH SAH 65-80 20 nM [100, 101] 
SAM-I SAM 100 4nM [102] 
SAM-II SAM 60 1 pM [63] 
SAM-III SAM 80 f [103] 
SAM-IV SAM 60 15 uM [104] 
Moco Moco 140 / [105] 
THF THF 100-120 70 nM [106] 
Amino acid 

Glycine Glycine 100-120 30 pM [107] 
Lysine Lysine 165-190 1 uM [108] 
Glutamine Glutamine 60-80 150 pM [109] 
GlcN6P GlcN6P 170 200 pM (410, 110) 
Nucleotide 

Adenine Adenine 70 300 nM (112, 113] 
Guanine Guanine 70 5 nM (113, 114] 
dG dG 70 80 nM [114] 
preQ,-I preQ; 40 50 nM [115] 


preQ,-II preQ; 25-45 100 nM [115, 116] 
c-di-GMP-I c-di-GMP 110 1nM [117] 
c-di-GMP-II c-di-GMP 90 200 pM [118] 

Ton 

Mg?* Magnesium 70 200 uM [119] 

F Fluoride 110 60 uM [120, 121] 


Abbreviations: TPP thiamine pyrophosphate, FMN flavin mononucleotide, Bj coenzyme Byp, 
AdoCbi adenosyl-cobalamine, SAH S-adenosylhomocysteine, SAM S-adenosylmethionine, Moco 
molybdenum cofactor, THF  tetrahydrofolate, GlcN6P  glucosamine-6-phosphate, dG 
deoxyguanosine, preQ/ prequeuosine, c-di-GMP cyclic dimeric guanosine monophosphate 

“The size of aptamer sequence 

The binding affinity is given according to the results of corresponding publications 

“Not detected 


Riboswitch as a cis-acting regulatory molecule has potential applications in 
dynamic control of metabolic pathways. This is dependent on several factors. First, 
the extreme structural flexibility of RNA aptamers enables highly specific recognition 
of a wide range of regulatory signals. A second factor is that riboswitches are 
RNA-derived regulatory molecules. This means the engineering work on 
riboswitches is relatively easy compared to that on the protein level. A third key 
factor is the feasibility of integration of riboswitches with different target genes, 
which allows one type of riboswitch to impact expression of genes located at different 
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pathways simultaneously. All these factors make riboswitches ideal genetic devices 
for realization of dynamic metabolic control, especially in cases where concerted 
regulation of multiple pathways is required. 


2.3 Protein-Level Bioswitches 


Compared with regulation at DNA and RNA levels, the control of metabolic fluxes 
through allosteric enzymes is realized without the processes of transcription and 
translation [5]. The main feature of allosteric enzymes is that they possess at least 
two stereospecifically distinct ligand binding sites: the active site where the sub- 
strate binds and the allosteric site where an allosteric effector binds. The binding of 
regulatory molecules at the allosteric site results in the modification of properties of 
the distinct active site. For example, the apparent change in binding affinity at the 
active site may result in either an increase of enzyme activity in the case of the 
binding of an activator or decrease of enzyme activity in the case of the binding of 
an inhibitor. 

For metabolic control, allosteric regulations conducted by allosteric enzymes are 
natural examples of control loops, including both feedforward from upstream sub- 
strates (Fig. le) and feedback from downstream products (Fig. If). This is found in 
the negative feedback loops of many biosynthetic pathways where one of the 
products of the pathway inhibits further production of the product by closing 
down an enzyme involved in one of early steps of the pathway. Alternatively, a 
pathway can be activated by the presence of a specific molecule which switches on 
one of its crucial enzymes. In general, the first enzyme or a key branch point of a 
pathway is down-regulated by the pathway’s product. The examples listed in 
Table 4 are collected from EF. coli and categorized according to their functions in 
different metabolic modules. It can be seen that there are both feedforward activa- 
tion and feedback inhibition during the generation of precursor metabolites and 
energy. Feedback inhibition is the main mechanism used in the biosynthetic 
pathways of amino acids whereas feedforward activation plays key roles in the 
biosynthesis of nucleosides and nucleotides. 

Among the regulatory domains existing in allosteric proteins, the ACT domain is 
a motif that was first identified as a regulatory module in a number of diverse 
proteins. The name originates from three of the proteins in the domain family: 
aspartokinase, chorismate mutase, and TyrA (prephenate dehydrogenase). It is a 
structural motif in proteins of 70-80 amino acids. The archetypical ACT domain is 
composed of four B strands and two « helices arranged in a BaBBaf fold [26]. It is 
one of a growing number of different intracellular small molecule binding domains 
that function in the control of metabolism, solute transport, and signal transduction. 
Particularly, most of the proteins containing the ACT domain are found to be 
involved in amino acid and purine synthesis and in many cases they are allosteric 
enzymes regulated by the binding of ligands [27]. For instance, the archetypical 
ACT domain protein E. coli b-3-phosphoglycerate dehydrogenase (3-PGDH) 
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Table 4 Allosteric enzymes discovered in E. coli 

Enzyme Gene Activator(s) Inhibitor(s) 

Generation of precursor metabolites and energy 

6-Phosphofructokinase I | pfkA GDP, ADP Phosphoenolpyruvate 

6-Phosphofructokinase II | pfkB ATP 

Pyruvate kinase I pykF Fructose-1,6-bisphosphate 

Pyruvate kinase II pykA AMP 

Citrate synthase gltaA Acetyl-CoA NAD+, oxaloacetate, 

NADH 

Phosphoenolpyruvate ppc A long-chain fatty acid, GTP, | (S)-Malate, 

carboxylase fructose-1,6-bisphosphate, L-aspartate 
acetyl-CoA 

Amino acids biosynthesis 

Aspartate kinase III lysC L-Lysine 

Dihydrodipicolinate dapA L-Lysine 

synthase 

Homoserine metA S-Adenosyl-L-methio- 

O-succinyltransferase nine, L-methionine 

Chorismate mutase pheA L-Phenylalanine 

Prephenate dehydratase pheA L-Phenylalanine 

p-3-phosphoglycerate serA Glycine, L-serine 

dehydrogenase 

Carbamoyl] phosphate carB, | Ammonium, Uridine-5’-phosphate 

synthetase carA inosine-5’-phosphate, 
L-ornithine 

ATP hisG L-Histidine 

phosphoribosyltransferase 

y-Glutamy] kinase proB L-Proline 


Nucleosides and nucleotides biosynthesis 


Amidophosphoribosyl purF Guanosine-5’-phosphate, 
transferase AMP 

Aspartate pyrl, ATP CTP 
carbamoyltransferase pyrB 

Uridylate kinase pyrH GTP 

CTP synthetase pyrG GTP 

Uridylate kinase pyrH GTP 


Data collected from Ecocyc (http://ecocyc.org/) 


catalyzes the first step in the biosynthesis of serine and its activity is regulated by 
the binding of glycine and serine. Aspartokinase III from E. coli is the first and key 
switch of pathways for the synthesis of aspartate-derived amino acids and it is 
inhibited by its end product lysine. The bifunctional chorismate mutase/prephenate 
dehydratase (P-protein) from E. coli catalyzes the first two steps in the biosynthesis 
of phenylalanine and its function is inhibited by the binding of phenylalanine. 
Although it is well known that the majority of proteins bind specific metabolites 
and that such interactions are relevant to metabolic and gene regulation, there are so 
far no efficient methods to identify functional allosteric protein-metabolite 
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interactions systematically. Based on dynamic metabolite data, an integrated 
approach combining both experiments and computations has recently been 
presented for discovery of allosteric regulations relevant in vivo [28]. In this 
approach, the culture conditions of E. coli were switched every 30 s between 
media containing either pyruvate or '*C-labeled fructose or glucose. The reversal 
of flux through glycolysis pathways was observed and the rapid changes in metab- 
olite concentration were measured. Then these data were fitted to a kinetic model of 
glycolysis and the consequences of 126 putative allosteric interactions on metab- 
olite dynamics were systematically tested. As a result, allosteric interactions 
governing the reversible switch between gluconeogenesis and glycolysis were 
identified, including one through which pyruvate activates fructose-1,6- 
bisphosphatase. It has been shown that this approach can identify the most likely 
interactions and provide hypotheses about their function from large sets of putative 
allosteric interactions. 


3 Engineering of Bioswitches 


Bioswitches that can sense different signals are interesting biological components 
with potential usage for realization of dynamic control of metabolic fluxes. 
Although a variety of natural bioswitches has been discovered so far, bioswitches, 
especially those that can respond to metabolites involved in the target biosynthetic 
pathways, are still needed in the practice. For example, the construction of artificial 
bioswitches that can respond to non-natural signals is both challenging and highly 
desirable for a precise and dynamic control of fluxes of growth-essential but 
competing pathways in metabolic engineering of industrial microorganisms. 
From the perspective of engineering, both the DNA-level and protein-level 
bioswitches are protein based although they function at different molecular levels. 
Thus, they are classified into the group of protein-based bioswitches in this section 
whereas bioswitches that function at the RNA level as described above belong to 
the group of RNA-based bioswitches. 


3.1 Engineering of Protein-Based Bioswitches 


As seen from the natural bioswitches, allosteric regulation is used as a very efficient 
mechanism to control metabolism in most biological processes. It is an important 
mechanism to maintain metabolic fluxes and limit accumulation of metabolic 
intermediates by binding effector molecules which are considered to function in a 
purely structural manner by selectively stabilizing a specific conformational state 
[29]. Understanding the mechanisms of allosteric regulation, especially the path- 
ways that mediate signal transduction from the allosteric site to the active site upon 
effector binding, can provide useful information for engineering bioswitches with 
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novel properties. However, proteins are inherently dynamic molecules which 
undergo structural fluctuations over a wide range of timescales. A thorough knowl- 
edge of the principles governing protein dynamics is therefore of fundamental 
importance for functional study and design of new protein functions. 


Computational Modeling of Allosteric Regulation 


Rapid advances have been made during the past few years in the investigation of 
protein dynamics. Besides experimental approaches, such as NMR relaxation, 
ultra-high resolution low-temperature X-ray crystallography, and ultra-fast laser 
technologies, computational tools such as molecular dynamics simulations of 
protein dynamics and allostery offer the opportunity to explore mechanistic details 
that are difficult to observe experimentally. There are two key challenges in the 
computational modeling of allostery. One is to predict the structure of one allosteric 
state starting from the structure of the other and the transition states between or to 
sample conformational states existing in the allosteric ensemble. The other is to 
elucidate the mechanisms underlying the conformational coupling of the effector 
and active sites and to identify residues that mediate the allosteric process. In 
practice, these challenges can be overcome by developments of novel modeling 
approaches and computational procedures. 


Prediction of Transition States 


Characterization of the conformational states of allosteric proteins requires access 
to long-time-scale motions, currently inaccessible by standard molecular dynamics 
simulations. In addition, large-scale conformational changes in proteins involve 
barrier-crossing transitions on the complex free energy surfaces of high- 
dimensional space. Such rare events cannot be efficiently captured by conventional 
molecular dynamics simulations. Special computational approaches are therefore 
needed to explore protein dynamics underlying allosteric regulation. To this end, 
advanced accelerated molecular dynamics approaches that extend the effective 
simulation time and capture large-scale motions of functional relevance have 
been explored and were employed to investigate the conformational changes 
associated with substrate binding to Trypanosoma cruzi proline racemase enzyme 
(TcPR), which are believed to expose critical residues that elicit a host mitogenic 
B-cell response [30]. Potential conformational epitopes located in the vicinity of 
newly identified transient binding pockets were also illustrated by subsequent 
conservation and fragment mapping analyses. To characterize the free energy 
profile of a conformational transition pathway in a high-dimensional space, the 
on-the-fly string method and the multi-state Bennett acceptance ratio (MBAR) 
method were combined by Matsunaga and coworkers [31]. In the study of EF. coli 
adenylate kinase, the minimum free energy paths of the conformational transitions 
were explored by the on-the-fly string method in 20-dimensional space spanned by 
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the 20 largest-amplitude principal modes. Moreover, evaluation of the free energy 
and various kinds of average physical quantities along the pathways are also 
possible with this combined approach. 

Challenges encountered in the study of long-time-scale motions can also be 
solved by simplified modeling approaches such as the normal mode model and a 
combination of network construction with coarse-grained model. To study the rigor 
to post-rigor transition in myosin, a consequence of ATP binding, a normal mode 
superposition model has been developed to predict the transition path between the 
two states obtained from the X-ray structures [32]. It was shown that rigid-body 
motions of the various subdomains and specific residues at the subdomain inter- 
faces are key elements in the transition. The allosteric communication between the 
nucleotide binding sites resulted from local changes upon ligand binding, and this 
induced large amplitude motions in the structure of the protein. It is hypothesized 
that allosteric communication in proteins relies upon networks of quaternary 
(collective, rigid-body) and tertiary (residue—residue contact) motions, and cyclic 
topology of these networks is necessary for allosteric communication. To prove 
this, a novel procedure was proposed by Daily and Gray [33]. In this procedure, 
rigid bodies were first identified from the displacement between the inactive and the 
active structures and “quaternary networks” were constructed from these rigid 
bodies. Finally, “global communication networks” were formed by integrating 
quaternary networks with a coarse-grained representation of contact 
rearrangements. 


Elucidation of Signal Transduction Pathways 


To discover signal transduction pathways that mediate the allosteric communica- 
tion and key residues involved in the allosteric process, varied approaches and 
algorithms have been reported, most of which are based on the results of molecular 
dynamics simulations. For example, an interaction-correlation analysis of the 
trajectories obtained from molecular dynamics simulations has been proposed and 
applied to the PDZ2 domain to identify the possible signal transduction pathways 
[34]. In this approach, a residue correlation matrix is constructed from the interac- 
tion energy correlations between all residue pairs along the trajectories of the 
simulations. With the residue correlation matrix, it is possible to discover contin- 
uous interaction pathways by a hierarchical clustering analysis as well as the 
energetic origin of the long-range coupling associated with allosteric regulation. 
In another study by Ma and coworkers [35] to reveal the anticooperative mecha- 
nism of PII protein from Synechococcus elongatus upon binding of 2-oxoglutarate, 
the binding pocket size was first defined by identifying residues that contributed 
greatly to the ligand binding. It was then found that the anticooperativity was 
realized through population shift of the binding pocket size in an asymmetric 
manner. Based on dynamic correlation analysis, a new algorithm was developed 
and utilized to discover residues that mediated the anticooperative process with 
high probability. Chen et al. [36, 37] took aspartokinase, an important allosteric 
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enzyme for industrial amino acids production, as a model system, and a predictive 
approach combining protein dynamics and evolution was demonstrated for rational 
reengineering of enzyme allostery. In this method, molecular dynamic simulations 
of aspartokinase and statistical coupling analysis of protein sequences of the 
aspartokinase family were combined to identify a cluster of residues which are 
correlated during protein motion and coupled during the evolution. This cluster of 
residues was believed to form an interconnected network that mediated the alloste- 
ric regulation. Experimental verifications with mutations of the key residues dem- 
onstrated the high efficiency and reliability of the combined approach for 
deregulation of aspartokinase from both E. coli and Corynebacterium glutamicum. 

To get more insight into how intramolecular communication occurs within an 
allosteric protein, a perturbation response scanning method has been developed. 
The key of this method is that it couples elastic network models with linear response 
theory to predict critical residues in allosteric transitions [38]. In the study of PDZ 
domain, it was found that the residues with the highest mean square fluctuation 
response upon perturbing the binding sites agreed well with experimentally deter- 
mined residues involved in allosteric transitions. Allosteric pathways can then be 
constructed by linking the residues giving the same directional response upon 
perturbation of the binding sites. The idea of perturbation has also been used in 
the energy dissipation model of allosteric regulation proposed by Ma 
et al. [39]. With F. coli aspartokinase II as a model system, a novel approach to 
reveal the intramolecular signal transduction network was developed based on the 
energy dissipation model [40]. A key feature of this approach is that direction 
information is specified after inferring the protein residue—residue interaction 
network involved in the process of signal transduction. This enables fundamental 
analysis of the regulation hierarchy and identification of regulation hubs of the 
signaling network. The energy dissipation model and network construction method 
have also been successfully applied to a heteromultimeric allosteric protein, 
C. glutamicum aspartokinase, to explore the signal transduction involved in 
intersubunit interactions and allosteric communication with emphasis on the 
intersubunit signaling process [41]. 


Strategies for Engineering Protein-Based Bioswitches 

For developments of new protein-based bioswitches, different strategies have been 
reported (Fig. 2). They can be engineering the binding pocket of an existing 
allosteric protein, fusing a naturally existing allosteric domain to the protein of 
interest, or directly modifying the structure of a non-allosteric enzyme. 


Engineering Bioswitches for New Ligand Binding 


Because there are at least two distinct binding sites in a typical bioswitch (one is the 
effector binding site responsible for the recognition of signal molecules and the 
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Fig. 2 Different strategies used for engineering of protein-based bioswitches. (a) Engineering 
bioswitches for new effector binding. (b) Engineering bioswitches for new substrate binding. (c) 
Engineering of bioswitches via domain swapping. (d) Engineering of bioswitches via domain 
insertion. (e) Creation of de novo bioswitches. W tryptophan; G glycine. Adapted from Deckert 
et al. [50] 


other is the active site responsible for the binding of modulated molecules), the 
most straightforward approach to construct novel bioswitches is to engineer the 
binding site for new ligands based on existing bioswitches. Both the method of 
rational design and the approach of directed evolution or their combination can be 
employed to obtain new binding sites. For engineering protein-based bioswitches, 
there are many individual experiences from protein engineering which have shown 
great success in practice. Nevertheless, the challenge of this strategy is to keep the 
allosteric function of the bioswitches after the modification. 

In an example by Tang and Cirino [42], the AraC regulatory protein from the 
E. coli ara operon was engineered to activate transcription in response to D- 
arabinose and not in response to its native effector, L-arabinose. To achieve this, 
two different AraC mutant libraries, each with four randomized binding pocket 
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residues, were subjected to fluorescence-activated cell sorting (FACS)-mediated 
dual screening using a GFP reporter. Both libraries yielded mutants with the desired 
switch in effector specificity, and one mutant was found to maintain tight repression 
in the absence of effector. This example demonstrated the power of dual screening 
for altering the ligand binding specificity of a protein inducer and represents steps 
toward the design of customized in vivo molecular reporters and genetic switches 
for metabolic engineering. 

In another study by Chen et al. [43], homoserine dehydrogenase (HSDH) of 
C. glutamicum, which is naturally allosterically regulated by L-threonine and L- 
isoleucine, was used as an example to demonstrate the feasibility of reengineering 
an allosteric enzyme to respond to a non-natural inhibitor L-lysine. To this end, the 
natural L-threonine binding sites of HSDH were first predicted and verified by 
mutagenesis experiments. Then the L-threonine binding sites were engineered to 
an L-lysine binding pocket by replacing a key loop responsible for the ligand 
binding specificity, which resulted in a reengineered HSDH which only responded 
to L-lysine inhibition but not to L-threonine. Because the L-threonine biosynthetic 
pathway is essential for cell growth and its formation is competing with the 
biosynthesis of L-lysine, this study represents a significant step toward the con- 
struction of artificial molecular circuits for dynamic control of the growth-essential 
byproduct formation pathway for L-lysine biosynthesis. 


Engineering of Bioswitches via Domain Fusion 


This strategy is based on the fact that the structures of proteins are often organized 
in functional domains and this is also true for allosteric proteins. In allosteric 
proteins, the signal recognition function is conducted by the regulatory domain 
and the active domain is a DNA binding domain in transcription factors or a 
catalytic domain in the case of allosteric enzymes. Thus, an efficient approach to 
construct protein-based bioswitches is to create hybrid proteins with switch-like 
behavior via domain fusion of two existing domains. For this purpose, the domain 
containing the function to be modulated is fused with the signal recognition 
domain. The challenge of this strategy is that the fusion should be conducted in a 
proper manner so that the input signal can be transmitted from the regulatory 
domain to the active domain, thereby modulating its activity. In practice, two 
approaches are usually employed to link the two functional domains. One is domain 
swapping which just connects the terminals of the two domains to create the 
recombinant allosteric protein. The other is domain insertion which involves 
circular permutation of one domain [44]. 

The strategy of domain swapping has recently been demonstrated with 3-deoxy- 
b-arabino-heptulosonate 7-phosphate synthase (DAHPS), the first enzyme of the 
aromatic amino acid biosynthesis. DAHPS shows remarkable variation in allosteric 
response and machinery and both contemporary regulated and unregulated 
orthologs have been reported. A chimeric protein was generated by joining the 
catalytic domain of an unregulated DAHPS with the regulatory domain of a 
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regulated enzyme [45]. It was shown that this simple gene fusion event on its own 
was sufficient to confer functional allostery to the unregulated enzyme. The fusion 
protein shared structural similarities with its regulated parent protein and underwent 
an analogous major conformational change in response to the binding of allosteric 
effector L-tyrosine to the regulatory domain. Domain swapping can also be used to 
create novel bioswitches by connecting domains from heterologous proteins. This 
has been illustrated by recombining the genes coding for TEM1f-lactamase (BLA) 
and the E. coli maltose binding protein (MBP) to create a family of MBP—BLA 
hybrids in which maltose was a positive or negative effector of B-lactam hydrolysis 
[46]. Some of the constructed MBP—BLA switches were effectively “on-off” in 
nature, with maltose altering catalytic activity by as much as 600-fold. 

The strategy of domain insertion is to engineer regulatory activities into proteins 
through interface design at conserved allosteric sites by creating a chimeric protein. 
A hybrid protein named PAS-DHFR has been constructed by connecting a light- 
sensing signaling domain from a plant member of the Per/Arnt/Sim (PAS) family of 
proteins with the dihydrofolate reductase (DHFR) from E. coli [47]. With no 
optimization, the hybrid protein exhibited light-dependent catalytic activity which 
depended on the site of connection and on known signaling mechanisms in both 
proteins. This example demonstrated that the intramolecular networks of two pro- 
teins can be joined across their surface sites such that the activity of one protein can 
be controlled by the activity of the other. In a recent study, a protein with a unique 
topology, called uniRapR, was constructed with the aid of computational protein 
design [48]. The key feature of this chimeric protein is that its conformation is 
controlled by the binding of a small molecule and the conformational change can be 
used as an artificial regulatory domain to control activity of kinases. To prove this, 
activation of Src kinase using uniRapR was demonstrated in both single cells and 
whole organisms. The rational creation of uniRapR not only offers a powerful 
means for targeted activation of many pathways to study signaling in living 
organisms but also exemplifies the strength of computational protein design. The 
more recent work by Feng et al. [49] attempts to provide a general methodology to 
develop biosensors for a broad range of molecules in eukaryotes. In this method, the 
ligand-binding domain is fused to either a fluorescent protein or a transcriptional 
activator and the key feature is that the protein is destabilized by mutation so that 
the fusion accumulates only in cells containing the target ligand. When this method 
was employed to develop biosensors for digoxin and progesterone, it was found that 
transcription was activated with a dynamic range of up to ~100-fold upon addition 
of ligand to the cells. 


Creation of De Novo Bioswitches 


Disruption and recovery of protein structure may represent a general technique for 
introducing allosteric control into proteins, and thus serves as a starting point to 
build a variety of protein-based bioswitches. This strategy has recently been 
demonstrated by designing a de novo allosteric effector site directly into the 
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catalytic domain of an enzyme and it is distinct from traditional chemical rescue of 
enzymes in that it relies on disruption and restoration of structure, rather than active 
site chemistry, as a means to achieve modulate function. In the two examples given 
by Deckert and coworkers [50], W33G in a B-glycosidase enzyme and W492G ina 
B-glucuronidase enzyme, indole-dependent activities were engineered into 
enzymes by removing a buried tryptophan side chain which served as a buttress 
for the active site architecture. In both cases the loss of function can be restored by 
the subsequent addition of indole. In particular, the rescued B-glycosidase was fully 
functionally equivalent to the corresponding wild-type enzyme and its activity can 
be modulated in living cells using indole as an input signal. 


3.2. Engineering of RNA-Based Bioswitches 


RNAs are ideal for the design of gene switches that can monitor and program 
cellular behavior. Because of the modular composition of riboswitches, engineered 
riboswitches can be made by first exploiting RNA aptamers as core component and 
then combining different aptamers and expression platforms. Novel riboswitches 
can also be constructed and identified through model-driven approaches or even 
rationally designed via structure-based methods (Fig. 3). 


In Vitro Selection Technology 


Engineered riboswitches can be made by exploiting RNA aptamers as the core 
component. They can be generated by an in vitro directed evolution technology 
called SELEX (systematic evolution of ligands by exponential enrichment). In this 
approach, a pool of randomized sequences is mixed with an immobilized target. 
Non-binding molecules are removed by washing whereas bound molecules are 
specifically eluted, amplified, and subjected to further rounds of selection. Gradu- 
ally increasing the stringency during the following cycles can lead to aptamers that 
bind with affinities in the picomolar range and discriminate between closely related 
compounds. Aptamers against a plethora of different ligands have been generated 
including ions, organic compounds such as amino acids or antibiotics, proteins, 
viruses, and even whole cells [51]. 

However, most naturally found riboswitches down-regulate gene expression on 
metabolite binding, probably because of their roles in negative feedback regulation 
within the metabolic pathways. Selections of gene switches have been performed 
by alternately employing separate ON (positive) and OFF (negative) selection 
markers. The use of independent selection markers for the ON and OFF selections 
significantly complicates the selection process by requiring plasmid isolation steps 
and increases the chances of isolating false positives in each step. To overcome this, 
an efficient platform to select engineered riboswitches and logic gates from com- 
plex libraries using a single selection marker has already been established by taking 
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Fig. 3. Different approaches for engineering of riboswitches. (a) In vitro selection of new 
aptamers. (b) In vitro dual selection of riboswitches. (c) Module-based construction of 
riboswitches with new aptamer. (d) Investigation of riboswitches with mathematical models. (e) 
Structure-based design of riboswitches 


advantage of tetA, which encodes a tetracycline/H+ antiporter as both a positive 
and a negative selection marker [52]. Expression of TetA confers tetracycline 
resistance on the cells (ON selection) whereas the overexpression of the 
membrane-bound protein renders them more sensitive to toxic metal salts such as 
NiCl, and other compounds (OFF selection). Use of a single selection marker for 
ON and OFF selections not only simplifies the selection procedure but also makes 
the process more robust against false positives. With the dual selection approach, a 
lysine ON riboswitch was recently successfully obtained from the lysine OFF 
riboswitch and used for improving L-lysine bioproduction [53]. 
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Although in vitro selection technology is a versatile experimental tool for the 
discovery of novel RNA molecules, finding complex RNA molecules is difficult 
because most RNAs identified from random sequence pools are simple motifs. 
Thus, enriching in vitro selection pools with complex structures could increase the 
probability of discovering novel RNAs. Recently, a computational approach was 
presented for designing a starting library of RNA sequences with increased forma- 
tion of complex structural motifs and enhanced affinity to a desired target molecule 
[54]. This approach consists of two steps: (1) generation of RNA sequences based 
on customized patterning of nucleotides with increased probability of forming a 
base pair and development of a set of criteria used for selection of a sequence with 
potential binding affinity; (2) with a protocol for RNA 3D structure prediction, a 
high-throughput virtual screening of the generated library is carried out to select 
aptamers with binding affinity to a small-molecule target. With integration of 
in vitro selection technology, this approach is expected to accelerate the experi- 
mental screening and selection of high-affinity aptamers by significantly reducing 
the search space of RNA sequences. 


Module-Based Construction 


Although in vitro selection technology is efficient to explore aptamers, it has been 
pointed out that only a few aptamers have the potential to be exploited as sensing 
domains for the engineering of riboswitches because a conformational change upon 
ligand binding has to occur and the association of the ligand has to be fast [55]. It 
has been demonstrated that riboswitches are modular in that they can host a variety 
of natural and synthetic aptamers to create novel chimeric RNAs that regulate 
transcription both in vitro and in vivo [56]. Modularity of riboswitches therefore 
enables facile engineering of novel genetic regulatory devices from aptamers. 
Moreover, this technique does not require selection of device-specific “communi- 
cation modules” required to transmit ligand binding to the regulatory domain, 
enabling rapid engineering of novel functional RNAs. With the module-based 
approach, it has been proved that transcriptional “ON” riboswitches are also 
capable of hosting foreign aptamers [57]. 

The design criteria for synthetic riboswitches acting on transcription have 
recently been examined by Wachsmuth et al. [58] using theophylline-dependent 
riboswitches as model systems. It was shown that terminator hairpin stability and 
folding traps had a major impact on the functionality of the designed constructs. 
Furthermore, a combination of several copies of individual riboswitches led to a 
much improved activation ratio between induced and uninduced gene activity and 
to a linear dose-dependent increase in reporter gene expression. By taking advan- 
tage of the modularity of riboswitches, novel riboswitches that work in a eukaryotic 
cell-free translation system has also be constructed [59]. In these riboswitches, 
translation mediated by an internal ribosome entry site was promoted only in the 
presence of a specific ligand, whereas it was inhibited in the absence of the ligand. 
The riboswitch, which was regulated by theophylline, showed a high switching 
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efficiency and dependency on theophylline. In addition, another three kinds of 
riboswitches controlled by FMN, tetracycline, and sulforhodamine B_ were 
constructed only by calculating the AG value of one stem-loop structure. 


Computational Approaches 


It is known that the function of riboswitches can be modulated through sequence 
alteration, but there are no quantitative frameworks to investigate or guide 
riboswitch tuning. It remains unclear how their sequence controls the physics of 
riboswitch switching and activation, particularly when changing the ligand-binding 
aptamer domain. To this end, mathematical modeling was combined with experi- 
mental approaches to investigate the relationship between riboswitch function and 
their performance [60]. Modeling results showed that the competition between 
reversible and irreversible rate constants dictated the performance for different 
regulatory mechanisms. It was also found that practical system restrictions, such 
as an upper limit on ligand concentration, can significantly alter the requirements 
for riboswitch performance, necessitating alternative tuning strategies. In another 
study, a statistical thermodynamic model was reported to predict the sequence- 
structure-function relationship for translation-regulating riboswitches that activate 
gene expression, characterized inside cells and within cell-free transcription-trans- 
lation assays [61]. With this model, automated computational design was carried 
out for 62 synthetic riboswitches that used 6 different RNA aptamers to sense 
diverse chemicals (theophylline, tetramethylrosamine, fluoride, dopamine, thyrox- 
ine, 2,4-dinitrotoluene) and activated gene expression by up to 383-fold. 

Because the three-dimensional structures are available for a growing subset of 
RNAs, structure-based techniques have been employed to study the mechanism of 
riboswitch and to guide the prediction and design for specific functions and new 
characteristics. For instance, guided by 3D structures, Wilson-Mitchell et al. [62] 
examined the recognition and specificity mechanisms of lysine riboswitches. In 
another report, structure-based design approach was combined with a fluorescence 
binding assay for development of SAM-II riboswitch aptamer and identification of 
a SAM analogue that selectively binds to SAM-II riboswitch aptamer with compa- 
rable binding affinity to its native metabolite [63]. In an attempt to de novo design a 
synthetic riboswitch that regulates gene expression at the transcriptional level, an in 
silico pipeline was developed to design the actuator part as RNA sequences that can 
fold into functional intrinsic terminator structures [64]. Using the well- 
characterized theophylline aptamer as sensor, several of the designed constructs 
showed ligand-dependent control of gene expression in F. coli, demonstrating that 
it is possible to engineer riboswitches not only for translational but also for 
transcriptional regulation. 


66 C.-W. Maet al. 


4 Applications of Bioswitches for Dynamic Metabolic 
Control 


Metabolic burden and imbalance caused by uncontrolled or deregulated metabolic 
pathways result in suboptimal productivity. Applications of bioswitches for 
dynamic control of metabolism can prevent the accumulation of temporarily 
unnecessary intermediates produced by heterologous pathways by fine-tuning the 
metabolic fluxes. Redirection of the endogenous resources into heterologous path- 
ways can be further tuned by dynamic control of competing pathways. In addition, 
modulation of bacterial behavior by manipulating molecular communication finds 
use in a variety of applications, particularly those employing natural or synthetic 
bacterial consortia (Fig. 4). 
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Fig. 4 Application examples of bioswitches for dynamic metabolic control. (a) Dynamic control 
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4.1 Dynamic Control of Bioproduction Pathways 


To demonstrate that product titers and conversion yields of heterologous pathways 
can be improved by dynamic control of bioproduction pathways, a dynamic sensor- 
regulator system was developed by Zhang et al. [65] to produce fatty acid-based 
products in F. coli for biodiesel production. In this dynamic system, the transcrip- 
tion factor FadR which senses fatty acyl-CoA was employed and the expression of 
genes involved in biodiesel production were dynamically regulated. With this 
implementation the stability of biodiesel-producing strains can be substantially 
improved. Additionally, the titer was increased to 1.5 g/L and the yield threefold 
to 28% of the theoretical maximum. 

For fatty acid biosynthesis, the formation of malonyl-CoA, which is 
biosynthesized from acetyl-CoA by the acetyl-CoA carboxylase, is the rate-limiting 
step. However, overexpression of acetyl-CoA carboxylase improves fatty acid 
production, but the cell growth is negatively influenced. This is expected to be 
solved by dynamical compensation of the critical enzymes involved in the supply 
and consumption of malonyl-CoA for efficient redirection of carbon flux toward 
fatty acids biosynthesis. As shown in the study by Xu et al. [66], implementation of 
this metabolic control resulted in an oscillatory malonyl-CoA pattern and a bal- 
anced metabolism between cell growth and product formation, yielding 15.7- and 
2.1-fold improvement in fatty acids titer compared with the wild-type strain and the 
strain carrying the uncontrolled metabolic pathway. Recently, another malony]l- 
CoA sensor-actuator that controls gene expression levels based on intracellular 
malonyl-CoA concentrations was devised [67]. With this sensor-actuator, the 
expression of acetyl-CoA carboxylase can be negatively controlled, which means 
that the expression of acetyl-CoA carboxylase is able to be up-regulated when the 
malonyl-CoA concentration is low, and the expression is down-regulated when 
excess amounts of malonyl-CoA are accumulated. It was shown that the toxicity 
associated with acetyl-CoA carboxylase overexpression can be effectively allevi- 
ated by the regulatory circuit. When the feedback circuit was used to regulate the 
fatty acid pathway, the fatty acid titer and productivity were increased by 34% and 
33%, respectively. The malonyl-CoA sensor can also be used in the production of 
other malonyl-CoA-derived products. In the work by David et al. [68], a hierarchi- 
cal dynamic control system is developed around the key pathway intermediate 
malonyl-CoA. The upper level of the control system ensures down-regulation of the 
endogenous use of malonyl-CoA for fatty acid biosynthesis whereas the lower level 
of the control system is based on the use of a novel biosensor for malonyl-CoA to 
activate expression of a heterologous pathway that uses this metabolite for produc- 
tion of 3-hydroxypropionic acid (3-HP). It was shown that the production of 3-HP 
was increased by tenfold after introduction of the dual pathway control. 

Dynamic control of pathway enzymes requires sensors that can detect and 
respond to pathway products or intermediates, but these are largely unknown. In 
a recent attempt to improve the production of non-native isoprenoids, dynamic 
control of ERG9 expression was explored by using different ergosterol-responsive 
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promoters [69]. For this purpose, several ergosterol-responsive promoters were 
identified using quantitative real-time PCR analysis in an engineered strain with 
relatively high mevalonate pathway activity. It was found that the expression levels 
for ERG11, ERG2, and ERG3 were significantly lower in the engineered strain over 
the reference strain, indicating that these genes were transcriptionally down- 
regulated when ergosterol was in excess. Replacement of the native ERG9 pro- 
moter with these ergosterol-responsive promoters revealed that all engineered 
strains improved amorpha-4,11-diene by two- to fivefold over the reference strain 
with ERG9 under its native promoter. Promoters that respond to the accumulation 
of toxic intermediates can also be identified via whole-genome transcript arrays and 
used to improve the final titers of a desired product by controlling accumulation of 
the intermediate. This approach was recently demonstrated by regulating farnesy] 
pyrophosphate production in the isoprenoid biosynthetic pathway in FE. coli [70]. It 
was shown that this strategy was able to improve production of amorphadiene, the 
final product, by twofold over that from inducible or constitutive promoters with 
reduced acetate accumulation and improved growth. 


4.2 Dynamic Control of Competing Pathways 


Dynamical tuning of endogenous processes is another efficient approach for redi- 
rection of endogenous resources into heterologous pathways. In the study given by 
Solomon et al. [71], a metabolite valve was proposed to balance the demands of cell 
health and pathway. To realize it, a control node of glucose utilization, glucokinase 
(Glk), was exogenously manipulated through either engineered antisense RNA or 
an inverting gene circuit. Results showed that these techniques were able to control 
glycolytic flux directly by redirection of glucose into a model pathway, leading to 
an increase in the pathway yield and reduced carbon waste to acetate. Moreover, the 
specific growth rate of engineered EF. coli can be reduced by up to 50% without 
altering final biomass accumulation. The same strategy was then employed to 
develop a metabolite valve in S. cerevisiae for control of glycolytic flux through 
the central carbon metabolism [72]. This was demonstrated by diverting glucose 
flux away from glycolysis into a model pathway, gluconate, in a hexokinase 2 and 
glucokinase 1| deleted strain. A maximum tenfold decrease in hexokinase activity 
was achieved by controlling the transcription of hexokinase 1 with the tetracycline 
transactivator protein, resulting in a 50-fold increase in gluconate yields, from 0.7% 
to 36% mol/mol of glucose. The reduction in glucose flux also led to a significant 
decrease in ethanol by production that extended to semianaerobic conditions shown 
in the production of isobutanol. It is worth noting that these applications involve 
control of a production pathway by external supplementation of inducers/repres- 
sors, which are different from the endogenous dynamic regulation as illustrated in 
the other applications. 

The deletion of a pathway responsible for growth and cell maintenance has 
seldom been employed, as conditional knockout is required to optimize 
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intracellular metabolism at each fermentation phase for bacterial growth and 
production. In this regard, a metabolic toggle switch was constructed in FE. coli as 
a novel conditional knockout approach and applied in isopropanol production 
[73]. The resulting redirection of excess carbon flux caused by interruption of the 
TCA cycle via switching gltA OFF improved isopropanol production titer and yield 
up to 3.7 and 3.1 times, respectively. To control the competing but essential 
metabolic by-pathways of lysine biosynthesis, similar strategy was employed to 
control the TCA cycle activity by using lysine riboswitch with intracellular L-lysine 
as a signal [74]. Lysine riboswitches from both E. coli (ECRS) and Bacillus subtilis 
(BSRS) were used to control the g/tA gene and thus the TCA cycle activity in a 
lysine-producing strain C. glutamicum LP917. Compared with the strain LP917, the 
lysine production was 63% higher in the mutant ECRS-g/tA and 38% higher in the 
mutant BSRS-g/tA, indicating a higher metabolic flux into the lysine synthesis 
pathway. A lysine-ON riboswitch library was constructed using tetA-based dual 
genetic selection based on the natural E. coli lysine-OFF riboswitch. Selected 
lysine-ON riboswitches were linked with the /ysE gene to achieve a dynamic 
control of lysine transport in a recombinant lysine-producing _ strain, 
C. glutamicum LPECRS, which bears a deregulated aspartokinase and a lysine- 
OFF riboswitch controlled citrate synthase. Batch fermentation results showed that, 
with the additional control of /ysE by a lysine-ON riboswitch, the strain achieved an 
increase in yield by 21% compared to that of the strain C. glutamicum LPECRS, 
and the concerted control by both OFF and ON type lysine riboswitches led to an 
increase in yield by 89% compared to that of the strain embedded with only 
deregulated aspartokinase [53]. 


4.3 Dynamic Control of Cell-Cell Communications 


Coordination between cell populations via prevailing metabolic cues has been 
noted as a promising approach to connect synthetic devices and drive phenotypic 
or product outcomes. To demonstrate this, “controller cells” have been developed 
by manipulating the molecular connection between cells via modulating the bacte- 
rial signal molecule, autoinducer-2 (AI-2), which is secreted as a quorum-sensing 
signal in many bacterial species [75]. Specifically, E. coli was engineered to 
overexpress components responsible for autoinducer uptake (/s;ACDB), phosphor- 
ylation (/srK), and degradation (/srFG). To characterize the dynamic balance 
among the various uptake mechanisms, a simple mathematical model was 
established by recapitulating experimental data. Two controller “knobs” were 
found to affect the increase of AI-2 uptake. One is the overexpression of the AI-2 
transporter, LsrACDB, which controls removal of extracellular AI-2. The other is 
the overexpression of the AI-2 kinase, LsrK, which increases the net uptake rate by 
limiting secretion of AI-2 back into the extracellular environment. 

With the quorum sensing system for cell density-dependent regulation of gene 
expression, a self-induced metabolic state switching was developed for microbial 


70 C.-W. Ma et al. 


isopropanol production [76]. To this end, a synthetic quorum sensing system was 
constructed using a synthetic /ux promoter and a positive feedback loop and used as 
a tunable cell density sensor-regulator in E. coli. In this system, self-induction of a 
target gene expression is driven by quorum-sensing signals, and its threshold cell 
density can be changed depending on the concentration of a chemical inducer. This 
study demonstrated that auto-redirection of metabolic flux from central metabolic 
pathways toward a synthetic isopropanol pathway at a desired cell population led to 
a significant increase in isopropanol production. 


5 Perspectives 


Bioswitches are of great interest for the development of industrial strains with 
productivity high enough to compete with traditional chemical routes, especially 
when heterologous pathways are integrated into the host cells. Allosteric regulation 
and riboswitches, as the fundamental mechanisms in biology to control cellular 
metabolism and gene regulation, provide a variety of candidates with potential 
applications in dynamic control of metabolic fluxes. However, the dynamic 
response range of natural bioswitches needs to be engineered so that they can be 
used in industrial strain development where the effector concentration is usually 
much higher than that in a non-producer. Meanwhile, concerted dynamic regulation 
of metabolic pathways according to the need of cultivation process is often neces- 
sary for developing effective bioproduction strains and processes. Switchable 
regulating biomolecules that can sense the intracellular concentration of metabo- 
lites with different response types and dynamic ranges are required to enable 
concerted dynamic control of multiple pathways. In addition, novel bioswitches 
that can respond to non-natural effectors are demanded to fulfill the diversity of 
regulatory targets encountered in metabolic engineering. 

As illustrated above, engineering work can be conducted on different types of 
bioswitches to obtain satisfactory characteristics. In the case where allosteric 
enzymes are used as bioswitches, the engineering work has to be carried out at 
the protein level, which could be challenging in some cases because of the com- 
plicated relationship between protein structure and its function. In addition, mod- 
ifications on the protein level are often not feasible to be transferred among 
different allosteric proteins and thus their applications for concerted control of 
multiple pathways are limited. As for protein-based DNA regulators, besides 
modifications of the effector binding site at the protein level, their DNA binding 
specificity also has to be engineered to avoid the cross-regulation of unexpected 
genes, which may involve engineering at both the DNA and the protein level. From 
the perspective of engineering, the response profile of RNA-level bioswitches can 
be easily engineered and its applications in dynamic control of multiple metabolic 
pathways are expected to attract more attention. 

In practice, rational approaches are desired for more efficient engineering of 
bioswitches although some of the engineering work can be accomplished by 
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random mutagenesis and in vivo/in vitro screening. In this regard, computational 
modeling with novel algorithms continue to contribute to the understanding of the 
underlying mechanisms and to developing bioswitches with novel properties. If 
necessary, they can be further improved by directed evolution methods [77]. On the 
other hand, to guide the design and optimization of bioswitches, techniques such as 
fast sampling and integrated on-line fast cell separation and quenching [78] can be 
used to access the intracellular metabolites concentrations more accurately. As for 
the regulatory points that should be controlled by the embedded bioswitches, they 
can be identified by using approaches such as time-resolved '*C-labeled metabolic 
flux analysis [79], which can also be utilized to investigate the regulatory effects of 
these dynamic controls. 
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Abstract The microbial metabolic versatility found in nature has inspired scien- 
tists to create microorganisms capable of producing value-added compounds. Many 
endeavors have been made to transfer and/or combine pathways, existing or even 
engineered enzymes with new function to tractable microorganisms to generate 
new metabolic routes for drug, biofuel, and specialty chemical production. How- 
ever, the success of these pathways can be impeded by different complications from 
an inherent failure of the pathway to cell perturbations. Pursuing ways to overcome 
these shortcomings, a wide variety of strategies have been developed. This chapter 
will review the computational algorithms and experimental tools used to design 
efficient metabolic routes, and construct and optimize biochemical pathways to 
produce chemicals of high interest. 
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1 Introduction 


During the last few decades, intensive exploitation of natural sources and increasing 
concerns on environmental pollution have motivated a growing interest in devel- 
oping sustainable processes to produce fuels, commodity chemicals, and natural 
products [1, 2]. Microorganisms have emerged as suitable platforms for sustainable, 
environmentally friendly, and cost-effective processes to produce a whole range of 
compounds [1, 3—5]. In nature, microorganisms have exhibited a wide metabolic 
versatility, allowing them to produce a variety of chemicals. This ability has been 
exploited by the scientific community to develop microbial cell factories to syn- 
thesize desired chemicals. 

In some cases, the chemical of interest is an endogenous metabolite and can be 
produced in the original organism. However, native pathways are usually tightly 
regulated and do not fulfill industrial productivity expectations. Therefore, 
overproduction of the desired compound can be achieved by metabolic engineering 
of the native host by, for example, channeling cellular fluxes toward the desired 
pathway or modulating cellular regulatory networks. In other cases, natural path- 
ways or synthetic pathways combining enzymes from different organisms or even 
new enzymes can be inserted in a more suitable heterologous host to produce the 
chemical of interest. Nevertheless, multi-enzymatic pathways from different spe- 
cies may not function optimally in the desired host. Causes for low or no production 
of the desired molecule are often multifactorial. In some circumstances, it is 
because of an inherent failure of the pathway. In others, such pathways into the 
cell usually generate different cell perturbations such as growth impairment, accu- 
mulation of metabolites, generation of toxic intermediates, and oxidative stress to 
name a few [1, 2, 6, 7]. Production of the target chemical can be achieved not only 
by optimizing the biochemical pathway but also by engineering the host microor- 
ganism. In this case, the overall metabolic performance of a cell may be improved 
by modulating gene expression on a genome scale using traditional gene deletion 
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methods or more recent techniques involving small regulatory ribonucleic acids 
(RNAs) [1, 2, 8-10]. However, methods for engineering the host microorganism are 
beyond the scope of this chapter and are not discussed here. 

In this chapter we discuss recent strategies used to design, engineer, and opti- 
mize biochemical pathways to produce chemicals of high interest. We describe 
computational algorithms used to design efficient metabolic routes and experimen- 
tal tools to construct and improve the efficiency of the designed pathway (Table 1). 


Table 1 List of tools for pathway engineering 


Tool Description Advantages References 
Pathway design tools 
Retropath Pathway design containing | Especially useful when the | [26, 152] 
circuits and self-regulation | regulatory elements are 
based on the specifications | being included into the 
given to the program pathway design 
OptForce Find the metabolic engi- Overproduction of the tar- | [153, 154] 
neering modifications on get molecule by optimiz- 
the flux of each reaction to | ing the flux of each 
improve the production of | reaction 
the target 
CORBAPy Network based algorithm | The network based algo- {155] 
which designs the elements | rithm allows the discovery 
of the network based on of unknown pathways 
biological hypothesis 
XTMS Design and score the pos- | Scoring system for ranking | [156] 
sible pathways for produc- | the pathways reduces the 
tion of the target chemical | number of constructs 
needed for characterization 
Metabolic tinker Search for all of the ther- Enabling tool for discov- [157] 
modynamically possible ering thermodynamically 
paths between two com- possible metabolic 
pounds (source and target) | pathways 
GEM-Path Specifically in E. coli and | Improved and fast [158] 
eliminate unfavorable searching algorithm for 
pathways in each step of pathways in E. coli 
the search 
Pathway construction tools 
BioBrick-based Sequential assembly based | The availability of acom- | [33-35] 
on restriction digestions prehensive library of stan- 
using standardized suffixes | dardized BioBrick parts 
and prefixes coupled with its modular- 
ity makes this method very 
powerful and flexible 
Gibson assembly- Overlapping sequences Scarless, fast and reliable | [30, 38- 
based at/near the end of the DNA | assembly of multiple parts | 40] 


parts are simultaneously 
chewed back and repaired 
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Tool Description Advantages References 
Ligase chain reaction | DNA bridges put the DNA | Modular method espe- [42] 
parts next to each other and | cially useful for combina- 
a thermostable ligase torial assemblies 
assembles them together 
Golden gate assem- Iterative cycles of restric- | Scarless, fast and reliable | [43, 45- 
bly-based tion and ligation using assembly of multiple parts | 48] 
Type IIS endonucleases 
that cleavage outside of the 
recognition site releasing 
tunable 4 bp overhangs 
DNA assembler Leveraging yeast homolo- | Flexible, reliable and [49, 50] 
gous recombination recommended for large 
machinery for assembling | constructs 
parts with designed 
homology regions 
Pathway optimization tools 
Gene expression 
Plasmid copy number | Modulate copy number of | Balancing of different (60, 61] 
plasmid to reduce meta- genes expression can be 
bolic burden easily modulated 
Chromosomal inte- Integrate the pathway into | Increase protein expres- [64, 69- 
gration (RAGE, a specific region of the sion, genetic stability and | 71, 75] 
CasEMBLR, genome reproducibility. Also 
Di-CRISPR) reduce metabolic burden 
Promoter strength Engineer promoters with Fine tuning of one or mul- | [76-82] 
spanned strength to modu- | tiple gene expression 
late gene expression 
Transcriptional Engineer terminators with | Increase mRNA stability [86, 87] 
terminators different strengths to mod- | allowing increase protein 
ulate gene expression expression. Also allow fine 
tuning of gene expression 
CRISPR-based Use modified CRISPR-Cas_ | Allow precise temporal [88-93] 
methods system to modulate gene repression or activation of 
expression a gene 
Codon optimization Replacement of codons to | Increase gene expression [94-96] 
meet the host codon bias 
Change of codons to mod-_| Favor translation [100-102] 
ify mRNA structure efficiency 
Randomization of codons __| Disable hidden control [27, 102] 
elements 
RBS optimization Optimization of RBS by Increase translation (28, 36, 
computational tools efficiency 107-109] 
Protein activity 
Protein engineering In vitro protein engineer- Increased activity bypasses | [13, 112] 


ing to increase activity or 
modify substrate 
specificity 


low gene expression. Sub- 
strate specificity increases 
catalytic efficiency and 
avoid side reactions 
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Tool Description Advantages References 
Library of homolo- Screen different protein Allow finding proteins ({117-119] 
gous proteins homologues with different | with best features 
traits 
Cofactors Increase cofactor expres- Reduce competition for (118, 121- 
sion levels or swap cofac- | cofactors 123] 
tor specificity 
Spatial localization 
Scaffolds Anchor the proteins of the | Favor metabolite tunneling | [127, 128] 
pathway to a scaffold avoiding diffusion 
Compartmentalization | Encapsulation of pathway | Reduce metabolite diffu- [129,.130, 
enzymes in cellular organ- | sion, avoid metabolite 135] 
elles or bacterial transport and regulation, 
microcompartments reduce toxicity, prevent 
competition for 
intermediates 


2 Pathway Design 


Designing pathways for chemical biosynthesis in microorganisms requires an 
in-depth knowledge of the enzymes catalyzing the reactions and of the physiology 
of microorganisms themselves. In many cases, this information is incomplete 
because of the complex nature of biological systems. Traditionally, the design 
process consists of surveying the literature to find the candidate genes and assem- 
bling those demonstrated to have the desired activities into a biochemical pathway. 
This is then followed by the characterization and optimization of the designed 
pathways (Fig. 1). However, because of the small number of genes that could be 
analyzed by a single person and the suboptimal decisions, this process is usually 
inefficient. Designing the pathways for the production of those chemicals is there- 
fore difficult and time consuming in many cases. Thus a considerable number of 
software packages have been developed to overcome this shortcoming. These 
packages in most cases generate a large list of series of enzymes (pathways) that 
can potentially convert one or more of the abundant precursors available in the cell 
to the desired products [11]. These pathways are then sorted based on a wide range 
of criteria and the best candidate pathways for this conversion are reported to the 
user. The chosen pathways are then constructed and characterized to find the most 
efficient. Each pathway comprises regulatory elements such as promoters, Ribo- 
some Binding Sites (RBS), terminators, and the genes coding the protein of interest. 
Because all these parts greatly depend on the host in which the pathway is 
expressed, we first discuss the criteria for choosing the proper host before looking 
into the intricacies of pathway design. 
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a ee Target molecule mama Choose the Host 


workflow from choosing the 
host to chemical production. 
After the host is chosen, the 
possible pathways are found 
and ranked and a few of the 
highest ranked pathways are 
constructed and 
characterized. After 
optimization of the best 
pathway, the production in 
high titers is achieved 


Optimize @am = High titer 
2.1 Choosing the Target Molecule and Host Organism 


As mentioned before, many different chemicals have been produced in microor- 
ganisms and these chemicals range from antibiotics and natural products to com- 
modity chemicals and fuels. For example, ethanol is produced on a very large scale 
for different applications from beverages to fuel. The target molecule is determined 
by the market, but the decision on the production host for that molecule is the key. 

Depending on its ecological niche, each organism has evolved and achieved 
some fitness advantages over others to ensure its survival and proliferation. This 
survival strategy is different from one organism to the next. For example, 
Escherichia coli has an astonishingly fast growth rate and consumes available 
nutrients very quickly, rapidly outperforming competing strains in the culture. On 
the other hand, Saccharomyces cerevisiae grows more slowly but produces ethanol 
which kills most of the bacteria present in the same culture, after which the alcohol 
is consumed as a carbon source. Yarrowia lipolitica has a rather interesting strategy 
and stores energy as intracellular lipids constituting up to 36% of its dry weight 
[12]. Bearing these differences in mind, there is no super host that is best for 
production of all target molecules. Therefore, the identity of the target molecule 
plays a very important role in the choice of the production host. As an example, 
S. cerevisiae is the ideal host for the production of ethanol and ethanol-derived 
chemicals [13—15] whereas Y. /ipolitica is a great host for production of fatty acid- 
derived products [16]. 

Another consideration when choosing the host is the danger it might pose to the 
end user of the product. If the final product is intended to be used as a food additive, 
the microbial host is preferred to have been granted GRAS status (Generally 
Regarded as Safe) by FDA (American Food and Drug Administration), QPS status 
(Qualified Presumption of Safety) by EFSA (European Food Safety Authority), or 
similar. Therefore, it can fulfill safety requirements, such as ensuring the absence of 
adverse health effects arising from the presence of endotoxins and emetic toxins. 

Host choice greatly affects the design of the pathway and the performance of the 
strain in the production setting. The availability of metabolic engineering tools is 
also another factor to take into account for selecting the host. Two of the most 
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Table 2 Guidelines for choosing a proper host 


Metabolic resources Abundant precursors and co-factors for the pathway of interest 
Minimum metabolic Ideally choose hosts with characteristics desired for the produc- 
adjustments tion of the final product, for example, produce ethanol in yeast 
and taking advantage of robust endogenous pathways 
Product secretion Good secretion ability in the host for product of interest 
Toxicity of products Ideally, none of the products or intermediates are toxic to the cell 
Genomic toolset and culti- Facile tools are available for genetic modification and engi- 
vation conditions neering. Cultivation conditions are not too difficult on an 


industrial scale, which includes the oxygen demand and opti- 
mum growth media and temperature 


Proper enzyme folding The protein of interest can be properly expressed and folded in 
the host of choice 


commonly used industrial microorganisms are FE. coli and S. cerevisiae. The 
extensive metabolic engineering toolkit for these organisms is one of the major 
factors for their preference as the production host. These hosts have been exten- 
sively studied and engineered to produce a wide variety of products from different 
feedstocks [17-20]. A detailed comparison between them and other alternative 
hosts, including the advantages and disadvantages of each system, is reviewed 
elsewhere [21]. 

The choice of host can have significant impact on the choice of pathways and 
enzymes for the production of the desired chemical. Even though it was shown that 
over half of the gene products involved in small molecule metabolism of EF. coli and 
yeast carry out common reactions [22], the regulatory elements are widely different 
between the two organisms [21]. Even different strains of the same microorganism 
can have different behaviors [23]. The host also determines the regulatory elements 
(promoters, terminators, and RBS), codon preferences, maturation modifications, 
and the secretion machinery. Fisher and co-workers suggested six factors for 
choosing the host [21], which we summarized in Table 2. 


2.2. In Silico Pathway Design 


Once the host is chosen, the pathway design process begins. Engineering the host to 
produce the target molecule in industrial quantities is challenging and requires 
careful considerations. Traditionally, a few pathways are selected based on similar 
pathways in the literature. However, because of the large and growing number of 
possible pathways, manually picking the best is inefficient and impractical. To 
solve this problem, a myriad of bioinformatics tools have been developed which 
can search public databases to design and rank possible pathways producing the 
target molecule. These models search through enzyme databases such as BRENDA 
[24] and, by finding the enzymes that can possibly catalyze the reaction, they 
generate a large number of potential pathways, many of which do not exist in 
nature. 
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Assembling all of these pathways is neither practical nor necessary. So the next 
step is to find the potential pathways with higher chances of success and 
constructing them. The best pathway is not necessarily a well-known pathway in 
nature and it may be a combination of genes from different organisms. Not being 
constrained to using native genes or obtaining all of the genes from one source has 
its advantages. The best pathway is usually chosen based on specificity of the 
enzymes for the desired reaction, the number of enzymes involved in the pathway, 
the thermodynamic favorability of the reactions in the pathway, and the toxicity of 
the intermediates to the cell [25]. 

There are many programs designed for pathway design and each uses a different 
search algorithm and ranking strategy. A list of some of these programs can be 
found in Table |. These computational tools have been successfully used in an 
extensive range of applications. In one example [26], Retropath was used to find 
pathways for flavonoid pinocembrin production. By searching the enzymes in the 
database, nine million pathways were predicted that could potentially produce this 
compound. This list was then narrowed down to 12 highly ranked pathways which 
were then constructed and characterized. The metabolic network was then opti- 
mized using Retropath and a 17-fold improvement in the final titer was achieved. 

Other elements of the pathway have been modeled and characterized too. For 
example, with the modeling of different RBSs, translation initiation rates can now 
be predicted with high accuracy [27]. This model was used to create a library of 
RBSs with different strengths and achieve a wide dynamic range of translation 
levels for proteins of interest. This RBS library calculator was then combined with a 
system level kinetic model and 73 different variations of a pathway were designed, 
built, and characterized [28]. 


3 Pathway Construction 


After the pathways of interest are selected and designed, they have to be assembled 
and constructed. Deoxyribonucleic acid (DNA) assembly strategies have been 
developed for a long time and have progressed from restriction digestion/ligation 
to more sophisticated seamless multi-part assembly methods [29]. Using the newly 
developed techniques, DNA constructs as large as the size of entire genomes and 
with as many as 25 parts have been assembled [30]. Because of the need for high 
throughput assembly and because of the sophistication of some of these DNA 
assembly techniques, many online tools have been developed to facilitate and 
optimize the design and assembly strategy for a specific construct. DNA assembly 
methods have been extensively reviewed [29, 31] and in this section we mostly 
focus on the more recent assembly methods and web tools that help select and plan 
the best assembly strategy. 


Pathway Design, Engineering, and Optimization 85 


3.1 Methods Based on Restriction Digestion/Ligation 
3.1.1 BioBrick-Based Methods 


Because of the complexity of biological parts and assembly strategies, extensive 
efforts have been put into modularizing the biological parts. The idea of these DNA 
parts or bricks and the tools to assemble them together easily was first introduced in 
1996 [32], but the term BioBrick was first used by Tom Knight and the assembly 
strategy was published later [33]. In the BioBrick method, all parts are stored in 
circular plasmids that are easily amplified. Restriction endonucleases EcoRI and 
Xbal are used as prefixes and Sphl and PstI as suffixes to create two compatible 
sticky ends between the parts being assembled. The parts are subsequently assem- 
bled by digestion and ligation. Iterative digestion and ligation allows the assembly 
of multiple parts (Fig. 2a). These standard parts are commonly used and stored in 
databases that are continuously updated. Over 2,000 parts are now available with 
more parts being designed and added to the database by researchers around the 
world. 
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Fig. 2 DNA assembly methods. (a) In BioBrick assembly, prefix and suffixes are used to 
assemble parts in order. Four enzymes — EcoRI (E), Xbal (X), Sphi (S), and PstI (P) — are used 
as the double digestions sites in the prefix and suffix regions. The correct set of prefix/suffix has to 
be chosen for each step and the final DNA molecule contains the suffix and prefix for further 
assembly rounds and addition of new parts. (b) In Gibson assembly, the T5 exonuclease digests the 
5’ ends of the DNA parts and the digested pieces are ligated to form the final construct. (c) In the 
Golden Gate assembly method, a Type IIS endonuclease such as the most commonly used Bsal is 
used to digest the region next to the recognition site (shown here in blue) generating a 4-bp 
overhang. In the ligation step the matching overhangs ligate together, resulting in the assembly of 
the DNA fragments in the designed order. (d) In DNA assembler method, the DNA parts with 
homology are recombined by the cellular homologous recombination machinery in S. cerevisiae 
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There have been many modifications and improvements over the BioBrick 
method to make the strategy more flexible and useful. ePathBrick is one of these 
methods in which the same principle of irreversible digestion/ligation was 
expanded to four restriction enzymes. This change makes it easier to assemble 
combinatorial pathways and in one example, they assembled seven parts on a single 
ePathBrick vector. Different variations of the same pathway were assembled to 
generate 54 different vectors [34]. This method was later widely used to construct 
combinatorial pathways to investigate the effect of each of the DNA parts on the 
whole pathway. Changing different parts of a pathway and seeing the effect on the 
production provides valuable insight on the function contribution of the individual 
parts. The ePathBrick method was used to assemble 18 plasmids with different 
combinations of a 3-gene catechin biosynthetic pathway. Three variants of the first 
two genes and two variants of the last genes were tested and characterized, and the 
best combination was found [35]. In another study, this ePathBrick was used to 
optimize the transcription rate of genes involved in the fatty acid biosynthetic 
pathway [36]. The entire pathway was divided into three modules and each module 
was transcribed from a different promoter. Changing the promoter regulating each 
of the modules enabled the researchers to identify the bottlenecks of the pathway 
and to reduce them by fine tuning the transcription levels. It is evident from these 
applications that modularity of an assembly method is very important and can lead 
to useful applications. 

One of the problems with BioBrick assembly is the use of site-specific restriction 
enzymes. Because the recognition sequence of these enzymes is usually rather 
short, it is likely that they are present in genes that are going to be cloned. A 
six-base pair recognition site, for example, can randomly appear roughly every 4 kb 
which makes this method troublesome for longer constructs. Traditionally synon- 
ymous point mutations are introduced to replace the pre-existing cut sites in the 
genes such that they are no longer recognized by the restriction enzymes [33]. 

An alternative method called iBrick is described in a recent paper which solves 
this problem to a great extent [37]. In this method, two restriction enzymes of I-Scel 
and PI-PspI with very long (>18 bp) recognition sites were used. Using these 
enzymes greatly reduces the probability of restriction sites found within genes 
and enables the users to construct longer pathways without the need for modulating 
the sequence of the genes involved. Using iBrick, a carotenoid (~4 kb) and 
actinorhodin (~20 kb) biosynthetic cluster was constructed without introducing 
point mutations. 


3.1.2 Gibson Assembly-Based Methods 


The Gibson assembly method was developed by Daniel Gibson in 2009 [30, 38, 
39]. This method allows for scarless single-pot assembly of multiple parts at the 
same time. The parts being assembled usually have around 25-bp homology which 
guides the assembly (Fig. 2b). After mixing the parts with T5 exonuclease, this 
enzyme starts digesting one strand of the parts (chew back) and the Phusion 


Pathway Design, Engineering, and Optimization 87 


polymerase starts repairing the DNA parts following the exonuclease. In this 
process, the flanking regions anneal to each other and, with the exonuclease being 
heat inactivated and Phusion polymerase catching up, the reaction is completed in 
the same buffer at a constant temperature of 50 °C. This isothermal assembly 
method was used to assemble 25 DNA fragments, constituting the entire Myco- 
plasma genitalium genome [40]. The Gibson Assembly Kit is commercially avail- 
able at New England BioLabs (Ipswich, MA) and many web tools are available for 
designing the overhangs between the DNA parts. 

One of the shortcomings of the Gibson assembly-based methods is that the two 
adjacent parts must have homology regions with each other. The promise of 
synthetic biology is modular design and a lot of protocols depend on this modular- 
ity. This modular design allows for better and easier construction of combinatorial 
assemblies. For example, in many studies different homologs of a gene have to be 
cloned in a pathway in multiple assemblies. However, using Gibson assembly- 
based methods, because there is a small homology between each two adjacent parts, 
changing one part in the assembly requires changing its adjacent parts as well, 
which becomes problematic in large libraries of constructs. This inherent short- 
coming can be overcome by designing linkers between the parts. By adding a short 
DNA sequence (linker) before and after each part, the assembly becomes indepen- 
dent of the sequence of parts and anything with the appropriate linkers can be 
inserted in the proper location. Designing linkers can be tricky because orthogo- 
nality of the linkers can make a huge difference in the assembly strategy. Decreas- 
ing the homology between the linkers can reduce the percentage of misassembly. 
R20DNA Designer [33] is an online tool to design these with improved efficiency. 
The optimized linkers were used with three homology-based assembly methods and 
efficiency of more than 75% was reported [33]. 

There have been many modifications and improvements on the Gibson assembly 
protocol. One of them sought a combination of BioBricks and Gibson assembly 
which results in both multi-part assembly of the Gibson method and modularity of 
BioBricks [41]. In this method, a long linker was designed between each of two 
parts to be assembled and, using Gibson assembly, all of the parts were assembled 
together. It is noteworthy that because of the sensitivity of the sequence before the 
start codon, the whole RBS region was used as the overhang but an overhang 
sequence was added between the terminator and the coding sequence. Using this 
method, PCR-amplified parts with BioBrick style linkers were generated and a 
randomized library with different promoters, genes, and terminators for the lyco- 
pene biosynthetic pathway was constructed with a 200-fold expression level dif- 
ference between the constructs [41]. 


3.1.3 Ligase Chain Reaction-Based Methods 


Ligase Chain Reaction (LCR) is an innovative scarless ligation based method 
optimized by scientists from Amyris (Emeryville, CA). In this method, a “bridge” 
is designed with homology between the parts to be assembled. The temperature of 
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the reaction is increased and the DNA is denatured. By decreasing the temperature, 
the bridge anneals to the fragments, and the two fragments are ligated together by a 
thermostable DNA ligase. This cycle is then repeated and the assembled fragments 
serve as a template for the next ligation reaction. This assembly technique is very 
versatile, and any combination of the parts can be assembled without 
pre-processing and designing specific overhangs. Because this process does not 
involve amplification in the assembly, the mutation rate is very low (less than | per 
>50 kb). It was reported that Gibson assembly cannot assemble constructs of 4 or 
more parts with efficiency of more than 50%, but LCR could assemble up to 
12 DNA pieces with more than 60% efficiency. Thirteen factors affecting the 
efficiency of LCR method have been experimentally optimized as a condition for 
LCR assembly [42]. 


3.1.4 Golden Gate-Based Assembly Methods 


The Golden Gate method relies on digestion with Type IIS endonucleases whose 
recognition site is adjacent to the cut site. The advantage of this mode of cleavage is 
that the sequence of the recognition site is independent of that of the cut site, and 
hence the resulting four base overhang can be customizable. This flexibility in 
choosing specific overhang sequences enables the user to design different over- 
hangs for each junction. Then, in the subsequent ligation step, the sticky ends are 
exposed, and complementary overhangs are ligated. Consequently, the assembly of 
the desired parts in the desired order can be achieved (Fig. 2c) [43]. 

Because of its flexibility and modularity, this method was quickly adopted as a 
gold standard for DNA assembly. Researchers have used it for a myriad of 
applications from large-scale TALEN synthesis [44] to natural product discovery 
[31]. In the first work, one-pot single step assembly of 13 DNA fragments was 
performed, achieving an efficiency of ~98%, which demonstrates the capability and 
robustness of the Golden Gate assembly method. 

Because of the widespread use of this method, many variations and improvements 
have been developed. One prominent example is MoClo [45, 46]. In this approach, a 
modular cloning system based on hierarchical assembly has been proposed. Here, in 
a first step of assembly, CDSs, promoters, RBSs, and terminators are assembled in 
a plasmid (level 0 assembly). Subsequently, these cassettes are assembled together in a 
second level of assembly (level 2 assembly). Iterative cycles of higher levels of 
assembly would produce larger cassettes making MoClo a powerful method for 
hierarchical assembly of large plasmids. Similarly, a comprehensive toolkit was also 
developed for S. cerevisiae. In this case, a set of characterized parts such as promoters, 
CDSs, and terminators are available in a Golden Gate-ready plasmid [47]. Similar 
to BioBrick assembly, standardized part libraries based on these methods have been 
deposited in Addgene and are available to the public [47, 48]. 

Despite the advantages of the Golden Gate method, there is one major limitation 
which may hamper its extensive use. The presence of the recognition site of the 
Type IIS endonuclease in the parts to be assembled greatly reduces the efficiency of 
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the assembly reaction. Thereby, these sequences should be avoided when possible. 
However, similar to BioBrick-based methods, this has to be performed only once 
and the repaired part can be reused for multiple assemblies. 


3.2. In Vivo Recombination-Based Methods 


Homologous recombination allows the assembly of multiple DNA parts with 
homology to each other inside the cells. This process is known to be used by the 
cells for repairing the unwanted DNA breaks which sometimes occur in the 
genome. By transforming the DNA pieces with homology, yeast cells recognize 
this homology and assemble the plasmids of interest by yeast homologous recom- 
bination [49]. This method, also known as DNA assembler, works by extending 
each part for ~40 bp to have a homology region with the adjacent part. This 
homology region is then recognized by the homologous recombination machinery 
and the parts get assembled. When all of the parts are assembled and the selection is 
applied, only the cells with circular plasmids survive (Fig. 2d). It should be noted 
that because this method is based on sequence homology, the parts are similar to 
what they would be if they were to be assembled by the Gibson assembly. The 
flexibility and ability to construct large plasmids using this method is a great 
advantage, but the slow growth of yeast cells and possible misassemblies because 
of the similarity of the homologous parts are the limitations of this method. Using 
this method, Shao and coworkers could assemble large constructs including a ~9-kb 
xylose utilization pathway, ~11 kb of zeaxanthin pathway, and a plasmid 
containing both of these pathways with more than 70% efficiency [49]. 

This method was later modified to improve the efficiency. In one report the 
origin of replication and marker were disconnected and each of them served as 
another part in the DNA assembly. The idea is that, because both of these parts have 
to be present in the assembled construct, some of the transformants harboring 
misassembled plasmids are unable to survive and fewer false positives are 
observed. This strategy resulted in 100-fold decrease in false positive transformants 
compared to the original DNA assembler method [50]. 

Because DNA assembler is a powerful method for assembling large constructs, 
many studies have used it for constructing large plasmids, many of which are larger 
than 20 kb [17, 51-55]. Nonetheless, by increasing the number of genes in the 
pathway, the percentage of correct constructs decreases and more colonies have to 
be picked to find the correct construct. It seems that having larger but fewer DNA 
pieces is a good strategy for getting less false positives with the DNA assembler 
method. One way to solve this problem is to combine in vitro and in vivo assembly 
methods. Yuan and coworkers [56] were able to assemble large constructs of ~13, 
22, and 44 kb plasmids by assembling small pieces of their construct using the LCR 
method and then assembling the larger pieces using DNA assembler. By combining 
these two methods they achieved the impressive fidelity of 71% for the 44-kb 
construct. 
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3.3. In Silico Design of DNA Assembly 


Given the variety and differences between the above-mentioned DNA assembly 
methods, it can be tricky to choose the right one. Some of the assembly methods 
perform the best for assemblies with larger pieces and some others with larger 
numbers of pieces. Sometimes it is easier and even more cost-effective to synthe- 
size some of the parts, but sometimes not. The j5 DNA design software is available 
that suggests when DNA synthesis is cost-effective. The success rate of the 
assembly in some cases also depends on the sequence of the parts being assembled. 
For instance, if the parts have high sequence homology with each other, DNA 
assembler may not be the ideal strategy because misassemblies are likely to happen. 
These sequence dependencies are difficult to detect manually and computational 
tools are required to suggest the best assembly strategy. If not considered carefully, 
many problems may arise because of these intricacies. 

One of the most widely used DNA assembly automation packages is j5 which 
designs combinatorial libraries and hierarchical assemblies with its extensive 
design rules [57]. It also takes advantage of the ever decreasing cost of DNA 
synthesis and suggests synthesis when it is cost effective to do so. j5 has an 
extensive cost optimization option which not only helps with the assembly protocol 
but also optimizes the cost, making it a useful tool for construction of a large 
number of pathways [58]. 

Another software package called Raven has an interactive learning function and 
can interact with the user [59]. This package designs the assembly strategy but gets 
feedback from the user and if one of the steps of the assembly doesn’t work for any 
reason, it changes the strategy to avoid that specific step. This package was reported 
to outperform the non-optimized assemblies with the p value of <0.0001. 


4 Pathway Optimization 


A designed pathway often does not function optimally in a desired host. Therefore, 
it is crucial to optimize a number of factors to obtain a functional and efficient 
pathway. Pathway optimization tools can be classified in three different groups: 
gene expression, protein function, and spatial localization. 


4.1 Gene Expression 


The introduction of a set of heterologous genes usually entails a metabolic burden 
for the host. As a result, the chemical of interest may not be produced in yields that 
fulfill the expectations of industrial implementation. To alleviate the metabolic 
stress and hence increase the yield of the chemical, the expression of a heterologous 
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pathway can be improved by tuning a number of parameters that usually affect 
either the transcription and/or the translation processes. 


4.1.1 Transcriptional Level 
Plasmid, Chromosomal Integration and Copy Number 


The introduction of metabolic pathways into a host is usually based on three 
different platforms: high-copy number plasmids (HCP), low-copy number plasmids 
(LCP), and chromosomal integration (CI). They perform differently and show pros 
and cons. Thus, when engineering a metabolic pathway, selecting the right cloning 
platform can be an arduous task. 

Many pathway designs rely on the use of plasmids because of their availability 
and variability. Traditionally it was believed that using HCP would benefit the 
expression of a pathway, as more copies would lead to higher protein expression, 
and thus overproduce the chemical of interest overall. This is the case of the 
salicylate biosynthesis pathway in EF. coli. When the EntC (isochorismate synthase) 
from E. coli and the PchB (isochorismate pyruvate lyase) from Pseudomonas 
fluorescens were expressed in an HCP, the salicylic acid (SA) titer reached 
~770 mg/L, whereas in LCP the production dropped to ~200 mg/L [60]. Neverthe- 
less, it was shown that LCP could also result in better expression levels than using 
HCP [61]. Possible explanations can be cellular toxicity of expressed heterologous 
proteins or limited availability of cellular expression machinery such as transcrip- 
tion factors, and therefore increasing the DNA copy number does not increase 
expression [47, 62]. Recently, Wu and coworkers engineered EF. coli to produce 
resveratrol from L-tyrosine. The authors divided the pathway in three modules that 
were expressed in individual plasmids with different copy numbers (from 10 to 
100) to modulate and alleviate bottlenecks in the pathway. A combination of low 
and middle copy number plasmids resulted in higher resveratrol production 
(~35 mg/L). Interestingly, when the higher copy number plasmid was used in any 
module, the resveratrol titers dropped dramatically [63]. 

Furthermore, the use of plasmids is usually associated with a metabolic burden 
on the cells being particularly obvious with HCP. It has been shown that cells 
carrying plasmids have generally lower growth rates than cells without plasmids 
[47, 64, 65]. It is believed that this burden is partially linked to the cost of the 
maintenance of the plasmid in the cell. Recently, Karim and coworkers who were 
intrigued by the factors that influence the “plasmid burden” in yeast cells investi- 
gated the effect of a number of plasmid elements, for instance origin of replication, 
selection markers, promoters, and copy number in haploid and diploid cells 
[66]. Interestingly, this study unraveled interactions between different elements 
that somehow could mask individual effects of plasmid elements. For example, 
increased plasmid loads are correlated with decreased growth rates. However, this 
impact is more evident in diploid cells than in haploid cells. Selection markers, 
especially auxotrophic ones, can also impair the growth significantly. In addition, 
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plasmid copy number can be modified up to a certain range by all the factors 
evaluated. These data reflect an intricate contribution to the plasmid burden of 
different factors which should be taken into account to make a knowledgeable 
decision when choosing a plasmid in metabolic pathway engineering [66]. 

Although plasmids are easy to handle and allow flexibility, they suffer from 
genetic instability forcing to maintain a selective pressure by using strict formulated 
media or addition of antibiotics that can boost the cost of the production of a 
chemical [47, 64, 67]. On the other hand, plasmid-based protein expression is not 
consistent from cell to cell, indicating that limited copy number regulation com- 
promises the reproducibility [47]. These disadvantages are turning chromosomal 
integration into the method of choice. 

Chromosomal integration overcomes all of these drawbacks. It was proved that 
gene integration in host genome produced reliable protein expression patterns, 
unaffected growth rates, and also bypassed the use of selective compounds. Addi- 
tionally, integration in the genome is usually in single or low copy number, so 
protein toxicity and competition for metabolites can be buffered [47, 64, 67]. Nev- 
ertheless, the genetic context where heterologous genes are integrated also seems to 
influence protein expression [64, 68]. Recently, Yin and coworkers studied the 
impact of the chromosomal location in the polyhydroxybutyrate (PHB) production 
in E. coli. The phaCAB operon (PHB synthesis pathway) from Ralstonia eutropha 
and a red fluorescent protein (RFP) were integrated downstream of 13 different 
chromosomal locations, some of them with high transcriptional activities. They 
found similar results for both rfp and phaCAB; the asnB (asparagine synthetase B) 
location showed the highest transcriptional levels out of the 13 locations evaluated, 
measured by real-time PCR. However, in the case of phaCAB, a single copy of the 
operon did not produce detectable levels of PHB. Then the phaCAB operon was 
integrated in multiple copies in the chromosome via Cre-loxP system. PHB was 
only detected when four copies of the operon were introduced, and PHB levels 
increased with the number of copies of the pathway, showing maximum levels 
(~34.1 wt%) with 50 copies [64]. At the same time, a plasmid-based phaCAB 
expression was evaluated. Despite the high levels of 43.68 wt% obtained, the 
production of PHB dropped dramatically to 8.08 wt% when the antibiotic pressure 
was removed, proving the instability of the plasmid system [64]. This work 
illustrated the importance of chromosomal integration, genomic context, and 
copy number to reach high production levels of a chemical. 

Bearing in mind these results, chromosomal integration strategies should allow 
efficient multiple copies integration in specific regions of the genome. However, 
traditional methods showing some limitations are tackled by recent studies. Gu and 
coworkers used a flippase recombinase (FLP) from yeast to optimize gene 
overexpression for amino acid production in EF. coli. The FLP can recombine two 
DNA sequences containing a 34-bp recombination site (FRT). Accordingly, one 
main requirement is that the host genome should contain an FRT site. The authors 
determined that increasing the concentration of donor plasmid in the transformation 
and the number of FRT sites in the chromosome led to an increased number of 
copies inserted into the genome [67]. Using this strategy, the production of L-tryp- 
tophan was optimized. The introduction of two copies of aroK gene (shikimate 
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kinase) in the chromosome increased L-tryptophan levels ~87%, although more 
number of copies decreased the production. In a similar way, the three genes serA**, 
serB, and serC for overproduction of L-serine were inserted in the chromosome in 
different copy numbers. The highest L-serine strain producer contained 10:4:4 
copy number combination of serA’”, serB, and serC genes, respectively. This 
FLP/FRT recombination strategy allowed optimizing and balancing integrated 
gene copy number of a metabolic pathway in a single step [67]. 

Chromosomal integration of an _ alginate-degrading pathway through 
recombinase-assisted genome engineering (RAGE) showed 40-fold improved eth- 
anol production titers from brown macroalgae over its corresponding plasmid- 
based counterparts [69, 70]. This study again highlighted the instability of 
plasmid-based pathways. In addition, the study revealed that the distance between 
the chromosomal origin of replication and the integration point impacted the 
growth, indicating the important role that the chromosomal location has in the 
expression of the pathway. In this case, modified enzymatic pathways from 34 to 
59 kb were efficiently integrated in F. coli chromosome using the Cre-lox recom- 
bination system. Moreover, the authors could apply the FLP/FRT recombination 
strategy to remove the antibiotic marker to generate a markerless strain for further 
chromosomal modifications. Through this approach it was also possible to balance 
the pathway copy number that allowed higher cell densities [69]. These 
recombinase-based methods are efficient and permit genomic integrations in spe- 
cific regions marked by recombination sites (loxP and/or FRT for instances) but this 
advantage turns into a limitation as these sites have to be previously introduced in 
the genome by other approaches. 

CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats/ 
associated protein-9 nuclease) is a powerful tool to generate double strand breaks 
(DSBs) in yeast chromosome in a single locus or multiple loci with high efficiency 
[71-74]. By combining the CRISPR/Cas9 editing properties and the yeast in vivo 
DNA assembly efficiency, it is possible to insert a multi-gene enzymatic pathway in 
a high efficiency, reliable, and marker-free fashion. By using the CasLMBLR 
method, Jakocitinas and coworkers integrated a carotenoid pathway composed of 
15 parts, and also developed a tyrosine-producing yeast strain by insertion of 
10 parts. The advantage of this method is that a set of linear DNA parts (promoters, 
ORFs, terminators for instance) with sequence homology in their ends can be 
assembled and integrated in a single step in any desired location in the genome 
with efficiencies ranging from 30 to 90% without the need for using selectable 
markers [71]. In spite of the fact that introduction of multiple copies of the same 
element can be challenging, CasEMBLR may allow swapping between biological 
parts easily, and reduce the effort in constructing donor plasmids containing 
different combinations of elements. 

Similarly, Shi and coworkers have also recently exploited the CRISPR/Cas9- 
based DSBs combined with yeast in vivo recombination. The authors have devel- 
oped Di-CRISPR, delta integration CRISPR, which targets delta sites in 
S. cerevisiae chromosome to integrate multiple pathway copies. Di-CRISPR 
enabled the integration of 18 copies of a large cassette (24 kb) consisting of a 
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xylose utilization and (R,R)-2,3-butanediol (BDO) production pathways in a single 
step with high efficiency [75]. 


Promoter Strength 


Promoter is a control element generating great modifications in gene expression as 
strong promoters usually result in increased mRNA levels. Therefore, increasing 
the promoter strength is a successful approach to enhance protein expression. 
Nevertheless, in multi-gene pathways this approach can lead to transcriptional/ 
translational stress, accumulation of metabolites, and toxicity. To prevent these 
problems, balancing the promoter strength between genes is an option. There have 
recently been much effort to characterize and develop libraries of natural/hybrid/ 
synthetic promoters with a wide dynamic range in terms of promoter strength that 
allow the precise regulation of each gene in the pathway [47, 76-79]. 

For example, Liang and coworkers developed a set of inducible hybrid pro- 
moters based on the GAL promoter in yeast. The new group of promoters was 
tightly regulated in the presence of minimal concentrations of estradiol (10 nM). By 
refactoring a zeaxanthin biosynthetic pathway in yeast using this set of promoters, 
the authors reported a production improvement of 50-fold over the pathway with 
constitutive promoters [80]. 

Lee and coworkers characterized a set of yeast constitutive promoters that 
allowed them to develop a linear regression model to engineer pathways in a 
predictable fashion. By this approach the authors achieved the production of 
violacein for the first time in yeast [81]. 

Similarly, Zhang and coworkers optimized the production of amorphadiene 
(AD), a precursor of artemisinin, by the experimental design-aided systematic 
pathway optimization (EDASPO) method. Basically, the pathway was divided 
into four modules, and the genes were under the control of T7 and T7-variant 
promoters. By characterizing a few combinations, the authors developed a linear 
regression model that enabled further optimization and achieved a threefold 
enhanced AD titer [82]. Balancing the promoter strength has been successfully 
used to engineer multi-gene pathways [19, 63, 83, 84]. 


Transcriptional Terminators 


Although terminators have an important role in the transcription termination and in 
the mRNA half-life [85], there have been fewer studies in terminator development 
and characterization for metabolic engineering applications. Recently, the impact 
of a number of terminators on gene expression in yeast was studied, revealing their 
capacity to modulate expression as much as promoters [86]. The authors found a 
strong correlation between the expression and the increased mRNA half-life, 
suggesting that terminators influence the stability of mRNA [86]. In another 
study, the same group developed a set of short synthetic terminators which 
performed similarly to those common in yeast. To evaluate the utility of these 
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terminators in pathway engineering they expressed the codon-optimized cis- 
aconitic acid decarboxylase (CAD1) gene from Aspergillus terreus under the 
control of the weak promoter TEFmut3, followed by the collection of synthetic 
terminators. Constructs containing the synthetic terminators allowed similar or 
even higher itaconic acid titers than those containing the yeast CYC1 terminator 
[87]. In addition to this, the synthetic terminators were functional in a different 
yeast species, suggesting a generalized use of these terminators [87]. These studies 
highlighted the potential utilization of terminators to modulate gene expression and 
metabolic pathway balancing for optimal pathway function in the designated host. 


CRISPR-Based Modulation 


A recent strategy to modulate gene expression is to exploit the properties of the 
CRISPR system. Qi and coworkers developed a CRISPR-based system for gene 
repression on a genome scale, CRISPR interference (CRISPRi) [88]. A dCas9 
(a mutant RNA-guided DNA endonuclease defective in nuclease activity) was 
used to bind a small guide RNA (sgRNA) that targets specific DNA sequences. 
The dCas9 binds sgRNA, and the complex sgRNA-dCas9 binds these DNA regions. 
A precise design of the sgRNA can target different DNA elements such as pro- 
moters or ORFs. Therefore, sgRNA-dCas9 complex can efficiently block the 
transcription process at different levels, for instance it can impede the transcription 
factor binding, the RNA polymerase binding, or the transcriptional elongation 
[88]. CRISPRi has been successfully used in metabolic engineering to modulate 
multiple genes of a polyhydroxyalkanoate (PHA) biosynthesis pathway in E. coli. 
By engineering sgRNA to different targets that produce a range of expression 
levels, it was possible to modulate the 4-hydroxybutyrate (4HB) content in poly 
(3-hydroxybutyrate-co-4-hydroxybutyrate) [P(3HB-co-4HB)] [89]. 

In a similar approach, dCas9 can be fused to a repressor or activator module, thus 
allowing the dCas9 guided by gRNA to silence or activate a gene [90-92]. Zalatan 
and coworkers further modified the CRISPR system to convert the gRNA into an 
RNA scaffold (scRNA). The scRNA was designed to contain sequences recognized 
by RNA-binding protein modules. Then, transcriptional activators or repressors 
were fused to RNA-binding proteins, so these activators or repressors could be 
recruited and bound to the RNA scaffold at a desired DNA location to activate or 
silence a gene [93]. It was anticipated that these CRISPR-based gene modulation 
approaches could be effectively used in multi-enzyme pathway optimization. It 
makes it feasible to turn ON/OFF enzyme expression to maximize pathway pro- 
ductivity by generating predictable and flexible metabolite flux. This approach was 
validated with a highly branched violacein biosynthetic pathway in yeast. This 
pathway consisted of five genes (VioABEDC) producing violacein as a final 
product. Nevertheless, different modulation of the last two steps (VioD and 
VioC) can generate four colored products. Thus, by switching on and off the 
expression of VioA, VioD, and VioC, all possible pathway routes were achieved 
in a predictable manner [93]. 
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4.1.2 Translational Level 
Codon Optimization 


It is widely believed that different organisms have different codon usage depending 
on the abundance and availability of tRNAs. Codon optimization strategies usually 
replace rare codons with those that meet the host codon bias. Therefore this can be 
an efficient tool to favor the translation of heterologous proteins. There are a 
number of studies where codon optimization successfully culminated in improved 
protein expression [94—96]. It is suggested that optimal codons improved the 
mRNA stability [97]. However, there is also empirical evidence that using frequent 
codons is sometimes detrimental [98]. There is great controversy about why this 
strategy is not consistent from protein to protein. Lanza and coworkers noticed that 
codon optimization is usually based on data from the whole genome, but growth 
conditions and other factors can modify tRNA abundance, so traditional approaches 
omit relevant information that can impact the translation process in specific envi- 
ronments [99]. To overcome this drawback, the authors developed a “condition- 
specific codon optimization” method consisting of using codon bias based on genes 
expressed under a desired condition. This approach increased 2.9-fold the catechol 
1,2-dioxygenase gene expression in yeast over a commercial optimized 
version [99]. 

It was recently suggested that the codon bias has low effect in translation 
efficiency [100, 101], pointing out that mRNA structure, especially in the first 
15-20 amino acids, is the main factor that affects the translation efficiency [100— 
102], and thereby secondary structures of mRNA may impede binding of ribosome 
and pause elongation [102]. On the other hand, it is known that control elements 
may appear embedded in the coding region that are difficult to identify. By 
randomizing the codon sequence, it is possible to disable these hidden elements 
[102]. Computational tools can assist in designing optimized genes that prevent the 
drawbacks arising from mRNA structures and cis-regulation [27, 102, 103]. 


Optimization of RBS 


The initiation of translation in prokaryotes occurs when the 16S rRNA of the small 
ribosome subunit binds to the Shine-Dalgarno (SD) sequence in the RBS in the 
mRNA. It is usually located 5—15 bases upstream from the start codon, and changes 
in its sequence-dictated affinity can change the expression levels several orders of 
magnitude, enabling fine-tuning of the pathway expression [27, 77, 102, 104-106]. 

Recent examples demonstrated the effectiveness of RBS optimization to 
increase the productivity of metabolic pathways; for instances, astaxanthin, fatty 
acids, and riboflavin titers were enhanced in E. coli [36, 107, 108]. However, 
screening a combinatorial library of RBS in a multi-gene pathway can be tedious 
and impracticable even with high-throughput screening methods. Also many of the 
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combinations may have detrimental effects bearing in mind that the initiation of 
translation can be affected by many factors: the structure of the mRNA can interfere 
with the binding of the ribosome to the RBS, so weak RBS can lead to low 
expression levels; strong interactions with the RBS can also cause stalling of 
translation, and distance between SD and start codon has been shown to be critical, 
for instance [27, 102]. It is obvious that a rational design is highly desirable. Current 
online tools use computational methods that circumvent these drawbacks, consid- 
ering all of the potential molecular interactions, and design RBSs with a wide range 
of initiation translational rates [27, 28, 109]. 


4.2 Protein Activity 


Despite the many efforts in enhancing and balancing gene expression in multi- 
enzyme pathways, in some cases the production of a desired chemical is still 
difficult to accomplish. In these cases, either the intrinsic activity/specificity or 
suboptimal environmental conditions can be a limiting factor. To achieve improved 
pathway outcomes, it is critical to modulate protein properties. 


4.2.1 Protein Engineering 


There are two general approaches to alter the intrinsic properties of a protein: 
directed evolution and rational design. Both strategies have been applied success- 
fully in protein engineering for pathway optimization. Whereas rational design 
requires a thorough knowledge of structure-function protein characteristics, 
directed evolution explores the whole protein sequence, and circumvents the 
limitations of incomplete structure-protein information [110, 111]. 

Lian and coworkers [13] constructed a cellobiose utilization pathway to produce 
ethanol from cellulosic biomass in yeast. It consisted of a cellodextrin transporter 
and a f-glucosidase. In this study the cellodextrin transporter 2 (CDT2) from 
Neurospora crassa was engineered by directed evolution to increase its cellobiose 
uptake activity. CDT2 is a facilitator, and thus does not consume ATP for cellobi- 
ose uptake, and it may provide energetic benefits in anaerobic cultures, although it 
is less efficient than others transporters such as CDT1. After three rounds of 
directed evolution, the best CDT2 evolved variant enabled over fourfold increased 
cellobiose consumption rate and ethanol productivity in anaerobic conditions. More 
rational design experiments showed that both specific activity and transporter 
expression levels were ameliorated. By this approach, the total ethanol yield was 
increased by more than 25% [13]. 

In another recent example, a biosynthetic pathway for cis,cis-muconic acid 
(ccMC) production consisting of three enzymes, AroZ, AroY, and CatA, was 
engineered in E. coli. The authors observed the accumulation of a metabolic 
intermediate, catechol, the substrate of CatA (catechol 1,2-dioxygenase). Replacing 
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inducible promoters by constitutive ones did not solve the bottleneck. Accordingly, 
rational design to alter enzyme activity mitigated the limiting step. Higher enzy- 
matic activities were obtained by widening the channel of the catalytic pocket. 
Improved CatA variant also led to ~26% improved ccMC productivity [112]. 

Introduction of unnatural amino acids (UAAs) in protein sequences can also 
diversify the biochemical properties of an enzyme or even lead to new functional- 
ities. Although incorporation of UAAs has been used in protein engineering, 
resulting in improved biocatalysts, it has not been applied in pathway optimization 
probably because the introduction of orthogonal pairs of aminoacyl-tRNA synthe- 
tase/tRNA in the desired host is still challenging and needs further optimization 
[113-116]. The use of engineered enzymes containing UAA in pathway optimiza- 
tion may increase the spectrum of catalytic reactions that can be performed by 
engineered hosts to address biosynthetic bottlenecks. 


4.2.2 Homologous Proteins 


Modification of protein properties by protein engineering to meet pathway require- 
ments can be challenging and often fails. Thus it sounds more feasible to find the 
appropriate protein among the current proteins available. In nature there exists 
proteins capable of executing the same function in a variety of organisms. Despite 
playing similar catalytic roles, they may exhibit diversified features such as differ- 
ent optimal pH, temperatures, higher activities, specificities, promiscuities, and 
regulation among others. Thus, the selection of the proper subset of enzymes with 
better performance in the desired host is essential in the construction of an efficient 
pathway. Nevertheless, limited information about the biochemical properties of 
proteins can hinder the design. In these circumstances, a less rational design such as 
a combinatorial library can bypass the lack of information. For example, 
Gluconobacter oxydans WSH-003 was engineered to produce 2-keto-L-gulonic 
acid (2KLG), a precursor of vitamin C. The heterologous pathway consisted of L- 
sorbose dehydrogenases (SDH) and L-sorbosone dehydrogenases (SNDH) from 
Ketogulonicigenium vulgare WSH-001. In this study, five SDH and two SNDH 
enzymes from K. vulgare WSH-001 with different features [117] were combined. 
Ten combinations were analyzed and the best one achieved 4.9 g/L of 2KLG [118]. 

Recently, more rational design has been used to engineer S. cerevisiae to 
produce taxadiene. The catalysis of farnesyl diphosphate (FPP) to geranylgeranyl 
diphosphate (GGPP) by geranylgeranyl diphosphate synthase (GGPPS) is a limiting 
step in taxadiene production. Thus the optimization of this enzyme may increase the 
productivity. A computational approach was used to predict the binding affinity of 
six GGPPSs from different organisms with its substrate FPP. The protein modeling 
and docking predicted that the GGPPSbe (from Taxus baccata x Taxus cuspidate) 
may benefit the limiting reaction. The authors proved the model empirically, and 
observed that the taxadiene titer was improved over tenfold using GGPPSbce [119]. 
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4.2.3 Cofactors 


A large number of enzymes involved in metabolic reactions depend on cofactors for 
proper functionality. When an exogenous pathway is introduced in a host, compe- 
tition for the cofactors and/or redox imbalance can emerge and cause metabolic 
stress, impair cellular growth, and an overall reduction in the productivity of the 
pathway [120]. Thus, tuning the concentration of cofactors [121, 122] or swapping 
the cofactor specificity [123] can be used to enhance pathway efficiency. 

Lim and coworkers elegantly compensated the redox imbalance created by the 
introduction of a synthetic n-butanol pathway in E. coli [122]. In this study, the 
E. coli host was previously engineered for production of butyrate where the cofactor 
regeneration pathway was redirected to use butyrate as the final electron acceptor 
[83]. Introduction of the heterologous n-butanol pathway in this host generated 
NADH deficiency, highlighting the need for further engineering. To supply more 
NADH, the authors modulated the pyruvate dehydrogenase complex (PDH enzy- 
matic complex) which catalyzes the decarboxylation of pyruvate into acetyl-CoA, 
producing CO, and NADH. To overcome the limitation conferred by strong inhibi- 
tion of the complex under anaerobic conditions, a mutant PDH complex active in 
anaerobic conditions and driven by strong control elements was integrated into the 
chromosome, yielding a 12% improvement in n-butanol titers. On the other hand, 
some pyruvate could still be catalyzed by NAD*-independent pyruvate formate 
lyase (PFL), producing acetyl-CoA and formate. The NAD*-dependent formate 
dehydrogenase (FDH) from yeast converts formate into CO, and produces NADH. 
The fdhi gene expression was fine-tuned using synthetic 5’-UTRs. The optimal 
engineered strain showed 35% increased n-butanol titers achieving 6.8 g/L [122]. 

In another example, Gao and coworkers engineered G. oxydans WSH-003 strain to 
produce 2KLG. The heterologous pathway consisted of SDH and SNDH from 
K. vulgare WSH-001. These dehydrogenases require pyrroloquinoline quinine 
(PQQ) for functionality, and may compete with native PQQ-dependent proteins. 
Thereby, the biosynthetic PQQ cluster was overexpressed to avoid a PQQ bottleneck. 
Increasing the supply of the cofactor resulted in an increase of 20% of 2KLG [118]. 

In a similar way, Cui and coworkers observed that increased NADPH concen- 
trations favored the production of shikimic acid (SA) in E. coli [121]. The shikimate 
dehydrogenase reduces 3-dehydroshikimate to shikimate using NADPH as a cofac- 
tor; therefore the availability of NADPH may limit the productivity of the pathway. 
The authors proved that overexpression of transhydrogenase (pntAB) or/and NAD 
kinase (nadK), two native enzymes involved in NADPH regeneration, increased the 
SA titer by more than twofold [121]. 


4.3 Spatial Localization 


The efficiency of a pathway sometimes depends on factors unrelated to protein 
expression or catalytic activity. Toxicity of metabolic intermediates, reduced 
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availability of intermediates because of diffusion or consumption in other metabolic 
pathways, and reduced local enzyme concentrations are some factors that hinder 
pathway efficiency. Colocalization of pathway enzymes can efficiently decrease 
intermediate loss, increase protein concentration, and reduce toxicity by metabolite 
tunneling. Spatial colocalization can be achieved by anchoring the enzymes in a 
scaffold or by enzyme sequestration into cellular compartments. This approach has 
been extensively reviewed elsewhere [4, 7, 124]. Here we describe a few recent 
successful examples. 


4.3.1 Scaffold Strategies 


This strategy for spatial colocalization of enzymes is based on the interaction 
between the proteins of interest and a synthetic protein [125], RNA [126], or 
DNA [127] scaffold. Proteins are fused to a binding domain that recognizes and 
anchors enzymes to the scaffold. 

A recent example is the improvement of butyrate production in EF. coli. Three 
enzymes of the pathway, 3-hydroxybutyryl-CoA dehydrogenase (Hbd), 
3-hydroxybutyryl-CoA dehydratase (Crt), and trans-enoyl-coenzyme A reductase 
(Ter), were fused to ligands for GBD, SH3, and PDZ domains. When these 
constructions were expressed in E. coli, the butyrate production increased from 
1.22 to 3.51 g/L [128]. The authors also observed a decline of by-product acetate 
production. It was suggested that the scaffold approach directed the carbon flux 
efficiently through the immobilized enzymes [128]. 

Another approach is to use DNA molecules as a scaffold. In this case the proteins 
of interest are fused to zinc-finger (ZF) domains that bind specific DNA motifs 
[127]. Thus, a plasmid DNA can be designed to contain a number of different 
recognitions sites for different ZF domains. Conrado and coworkers proved the 
feasibility of DNA scaffolds in metabolic engineering by increasing the productiv- 
ity of trans-resveratrol, 1,2-propanediol, and mevalonate in E. coli. In this study the 
authors corroborated the hypothesis that improved yields were the result of optimal 
proximity between the enzymes of the pathway optimizing metabolites channeling. 
For this purpose, the ZF binding motifs in the DNA scaffold for each enzyme were 
located far from each other (no proximity between enzymes) or with 2-12 bp 
spacers (proximity between enzymes). The improvements were annulled when 
the enzymes were far apart [127]. 


4.3.2. Compartmentalization 


The use of scaffolds to organize enzymes spatially helps to improve metabolites 
channeling, but it can impede the proper folding of multimer enzymes or cause a 
metabolic burden by consuming additional cellular sources to synthesize the scaf- 
fold [4]. Pathway encapsulation can overcome these issues and benefit pathway 
engineering. Expression of all the enzymes in a pathway scaffold-free in a specific 
cellular organelle avoids metabolites transport, diffusion and leakage, prevents 
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competition for intermediates with other pathways, escapes from regulation, and 
increases the concentration of enzymes and proximities between them (small 
compartments compared with cytoplasm) [129]. 

One striking example that represents the benefits of compartmentalization 
targeted the Ehrlich pathway into yeast mitochondria to produce isobutanol. The 
isobutanol pathway consisted of five enzymes divided into two sets: (1) acetolactate 
synthase (ALS), ketolacid reductoisomerase (KARI), and dehydroxyacid 
dehydratase (DADH) which are present in mitochondria and (2) a-ketoacid decar- 
boxylase (a-KDC) and alcohol dehydrogenase (ADH) which are usually in cyto- 
plasm. In this study, «-KDC and ADH were directed to mitochondria by fusion with 
the N-terminal mitochondrial localization signal from subunit IV of the yeast 
cytochrome c oxidase (CoxIV). a-Ketoisovalerate (a-KIV) is produced by DADH 
in the mitochondria and has to be transported to the cytoplasm to be further 
modified by a-KDC. The authors found that one limiting factor in the pathway 
was the availability of «-KIV in the cytoplasm. Thus avoiding the transport of this 
intermediate from mitochondria to cytoplasm may increase the availability of the 
intermediate, and hence increase the titer of the pathway. The overexpression of the 
five genes together with the targeting of a-KDC and ADH to mitochondria enabled 
titers of 635 mg/L of isobutanol, which represented ~twofold improvement over the 
same pathway with a-KDC and ADH directed to the cytoplasm (380 mg/L) and 
over ninefold compared with the control with an empty plasmid (67 mg/L) 
[129]. Additionally, the authors reported an increment in the production of other 
branched-chain alcohols as isopentanol and 2-methyl-1-butanol. One suggestion 
that the authors proposed to support these phenomena is that the first three enzymes 
of the pathway are also involved in isoleucine, leucine, and valine biosynthetic 
pathways generating metabolic intermediates that eventually can be a substrate for 
a-KDC and ADH producing isopentanol and 2-methyl-1-butanol [129]. 

In another example, the production of penicillin was enhanced by targeting part 
of the biosynthetic penicillin pathway to the peroxisome in Aspergillus nidulans. 
Three enzymes are involved in the pathway; the last one (isopenicillin 
N acyltransferase, AatA) is located in the peroxisome whereas the other two 
(6-(L-a-aminoadipyl)-L-cysteinyl-p-valine synthetase, AcvA and isopenicillin N 
synthase, IpnA) are in the cytoplasm. As the intermediates need to be transported 
into the peroxisome, it was suggested that colocalizing all the enzymes in the same 
compartment may benefit the production of penicillin. The authors found that 
targeting AcvA into the organelle increased the penicillin production by 3.2-fold. 
Interestingly, targeting IpnA to the peroxisome dropped penicillin production 
drastically. One reason could be that the redox state of the peroxisome did not 
provide the appropriate environmental conditions for activity and stability of 
IpnA [130]. 

The compartmentalization strategy also enabled the enhancement of itaconic 
acid production in Aspergillus niger. Overexpression of two enzymes, cis-aconitate 
decarboxylase and aconitase, in mitochondria led to a twofold improvement of 
itaconic acid production compared with the overexpression of the two enzymes in 
the cytoplasm [131]. However, a similar approach by targeting cis-aconitic acid 
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decarboxylase (CAD) from Aspergillus terreus to mitochondria failed to improved 
itaconic acid production in S. cerevisiae [132]. 

Compartmentalization strategies can definitely be used in metabolic engineering 
to improve pathway efficiency. Nevertheless, organelle environmental consider- 
ations should be taken into account as the conditions may not be favorable for 
specific enzymatic reactions. 

Harnessing the cellular organelles has been used for pathway engineering in 
eukaryotes. In the case of prokaryotes, the use of bacterial microcompartments 
(MCPs) for metabolic engineering is a promising strategy [133, 134]. Bacterial 
MCPs are metabolic enzymes involved in a specific process encapsulated in protein 
shells that encase metabolic intermediates which can be volatile or toxic for the cell 
[133, 134]. Although the use of MCPs in pathway optimization is in its earliest 
stages, and needs further characterization, recently Lawrence and coworkers proved 
its potential in metabolic engineering. The authors reproduced an MCP from 
Citrobacter freundii to generate a bioreactor to produce ethanol in FE. coli. The 
pyruvate decarboxylase (encoded by the pdc gene) and alcohol dehydrogenase 
(encoded by the adh gene) from Zymomonas mobilis were targeted to the heterol- 
ogous MCP. The ethanol production was almost doubled in those strains expressing 
PDC and ADH targeted to the MCP [135]. 


5 Applications 


Metabolic engineering and synthetic biology tools have enabled the engineering of 
microorganisms to produce a wide spectrum of chemicals with applications in 
several fields. The examples described below highlight recent advances in the 
design, construction, and optimization of pathways for biosynthesis of valuable 
chemicals. 


5.1 Production of Pharmaceutical Products 


Natural products are the main source of drugs and pharmaceuticals. However, 
recovery of these products from their natural source is usually tedious, time 
consuming, and inefficient, and their chemical synthesis is not always available. 
Thus, there is an increasing interest in developing new manufacturing platforms 
based on model microorganisms that are easy to manipulate and are usually able to 
reproduce numerous enzymatic steps in mild conditions which are more environ- 
mentally friendly than the chemical synthesis. 

The biosynthetic production of many pharmaceuticals has been accomplished. 
Some examples are the production of: (2)-pinocembrim, suggested for treatment 
of cerebral ischemic injury [136, 137]; shikimic acid, precursor of an anti-influenza 
drug [121]; catechins, precursor of anthocyanins and tannins [35]; resveratrol, as a 
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therapeutic compound [63, 127]; penicillin [130]; and N-acetylglucosamine, as 
treatment for cartilage disease [84]. Here we discuss the production of artemisinic 
acid [138] and opioids [139]. 

The synthesis of artemisinic acid, a precursor of the antimalarial artemisinin, in 
large quantities is a remarkable example. To date, the unique source of artemisinin 
has been its natural plant producer Artemisia annua. Nevertheless, the supply of this 
plant to the pharmaceutical industry was environment-dependent, generating fluc- 
tuation in the price from year to year. Since 2004 many attempts have been made to 
produce artemisinin commercially affordable, especially in the developing world 
[138]. For that purpose it was proposed to develop a microorganism-based platform 
capable of synthesizing at least 25 g/L of artemisinin [138]. There are two key steps 
in the biosynthesis of artemisinin: (1) synthesis of amorphadiene and (2) synthesis 
of artemisinic acid which can be chemically converted to artemisinin [138]. Thus, 
the main objective was to overproduce amorphadiene. Although the production of 
the intermediate metabolite amorphadiene in EF. coli was improved up to levels of 
27.4 g/L [140], the following steps in the pathway to obtain artemisinic acid 
dissuaded researchers from continuing to engineer F. coli. The main reason for 
this decision was the general limited ability of E. coli to express heterologous 
eukaryotic P450, an enzyme involved in the conversion of amorphadiene into 
artemisinic acid. As the success of the project at this point was compromised, it 
was concluded that a change of production host would benefit overall the final 
productivity of the pathway. Then S. cerevisiae was engineered to produce 
artemisinic acid [141, 142] (Fig. 3). The first step to increase the production of 
amorphadiene was to overexpress the mevalonate pathway genes responsible for 
conversion of acetyl-CoA to FPP by using galactose-inducible strong promoters. 
The copy number of tHMG1 gene (truncated HMG-CoA reductase) was also trip- 
licated as its expression was found to be a rate-limiting factor. The heterologous 
gene from A. annua amorphadiene synthase (ADS) expressed in a high-copy 
plasmid was codon-optimized for S. cerevisiae, although the production of 
amorphadiene was not improved compared with the non-codon-optimized version 
of the gene. Finally, the optimization of the fermentation conditions led to 37-41 g/ 
L amorphadiene titers [142]. The next step was to introduce the amorphadiene 
oxidase cytochrome P450 (CYP71AV 1) and its cognate reductase (CPR1) together 
with the ADS gene from A. annua driven by galactose-inducible strong promoters 
in a high-copy plasmid to convert amorphadiene into artemisinic acid. To reduce 
the toxicity generated by high levels of CPR1, it was integrated in the genome ina 
single copy under a weak promoter (GAL3 promoter). Finally, the cytochrome b; 
from A. annua (CYBS), the artemisinic aldehyde dehydrogenase (ALDH1), and 
alcohol dehydrogenase (ADH1) from A. annua were also integrated in the genome 
under a strong promoter (GAL7p) (Fig. 3). The artemisinic acid titers obtained were 
25 g/L [143]. These levels allow the economical production of artemisinin through 
a photochemical transformation process developed by Sanofi [138, 143, 144]. Bio- 
engineering of yeast has enabled the current cost-effective industrial production of 
artemisinin, making the antimalarial treatment affordable in developing countries, 
and hence saving lives [138]. 
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Fig. 3 Scheme of pathway engineering for semi-synthetic artemisinin production in yeast. The 
first step was to increase the levels of amorphadiene by overexpression of the mevalonate pathway 
(from acetyl-CoA to FPP), and it was also necessary to introduce ADS. The second step was to 
convert amorphadiene into artemisinic acid by introduction of additional genes (see text for more 
details). Finally, purified artemisinic acid was converted into artemisinin through a chemical 
process. ERGJO acetoacetyl-CoA thiolase, ERG/J3 HMG-CoA synthase, tHMGI truncated 
HMG-CoA reductase, ERG/2 mevalonate kinase, ERG8 phosphomevalonate kinase, MVD/ 
mevalonate diphosphate decarboxylase, /D// isopentenyl diphosphate isomerase, ERG20 farnesyl 
diphosphate synthase, FPP farnesyl diphosphate, ADS amorphadiene synthase, CYP7/AV1 cyto- 
chrome P450 enzyme, CYBS5 cytochrome b;, ADH/ artemisinic alcohol dehydrogenase, ALDH 1 
artemisinic aldehyde dehydrogenase 


Another interesting example is the production of opioids in S. cerevisiae 
[139, 145]. Opioid drugs are used in the medical treatment of severe pain. Currently 
these drugs are derived from the opium poppy (Papaver somniferum). As in the 
case of artemisinin, poppy agribusiness is susceptible to environmental factors, and 
is also subjected to strict governmental control. This study described the engineer- 
ing efforts to produce thebaine and hydrocodone in baker’s yeast. The biosynthetic 
pathway genes were divided into modules to facilitate their optimization (Fig. 4). 
The first step was to increase the carbon flux to (S)-reticuline biosynthesis. Four 
modules containing 17 genes from a variety of organisms (plants, bacteria, yeast, 
and mammals) were integrated in the genome. Module I was designed to increase L- 
tyrosine and 4-hydroxyphenylacetaldehyde (4-HPAA), precursors of the (S)- 
reticuline. Module II contained the genes to synthesize and recycle the 
tetrahydrobiopterin (BH4) redox cofactor. Module III included the genes to syn- 
thesize (S)-norcoclaurine. Module IV contained the genes to synthesize (S)- 
reticuline. The integration of these modules in yeast genome gave rise to 20 pg/L 
of (S)-reticuline. Module V contained additional copies of three genes (mutated 
tyrosine hydroxylase, TyrH®; 4/-O-methyltransferase, 4‘OMT; and norcoclaurine 
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Fig. 4 Scheme of pathway engineering for opioids (thebaine and hydrocodone) production in 
yeast. Genes included in the same module are represented by the same color. Module I contains the 
genes to increase precursors of (S)-reticuline: Tk//p transketolase, Aro4p@/* 3-deoxy-p-arabino- 
2-heptulosonic acid-7-phosphate (DAHP) synthase (mutation Q166K), Aro7p’”*” chorismate 
mutase (mutation T226I), Aro/0p phenylpyruvate decarbosylase. Module II contains the genes 
to synthesize and recycle the tetrahydrobiopterin (BH4) redox cofactor: PTPS 6-pyruvoyl 
tetrahydrobiopterin synthase, SepR sepiapterin reductase, QDHPR quinonoid dihydropteridine 
reductase, PCD pterin carbinolamine. Module III contains the genes to synthesize (S)- 
norcoclaurine: TyrH® tyrosine hydroxylase (mutations R37E, R38E, W166Y), DHFR 
dihydrofolate reductase, DoDC L-DOPA decarboxylase, NCS norcoclaurine synthase. Module 
IV contains the genes to synthesize (S)-reticuline: 6OMT norcoclaurine 6-O-methyltransferase, 
CNMT coclaurine N-methyltransferase, 4’OMT  3/-hydroxy-N-methylcoclaurine 4/-O- 
methyltransferase, NMCH N-methylcoclaurine hydroxylase. Module V contains additional copies 
of TyrH™®, NCS, and 4/OMT. Module VI contains the genes to synthesize thebaine: DRS-DRR 
1,2-dehydroreticuline synthase-1,2-dehydroreticuline reductase, Sa/Syn salutaridine synthase, 
SalR salutaridine reductase, SalAT salutaridinol 7-O-acetyltransferase. Module VII contains the 
genes to synthesize hydrocodone: T60DM thebaine 6-O-demethylase, MorB morphine reductase 


synthase, NCS) which were suggested to produce a bottleneck in the pathway flux. 
The introduction of module V led to a fourfold improvement of the (S)-reticuline 
titers. The second step was the production of thebaine. The four enzymes involved 
in this process were engineered. First, the discovery of the 1,2-dehydroreticuline 
synthase/reductase (DRS/DRR), an epimerase that converts (S)-reticuline to (R)- 
reticuline, was a key step for the production of thebaine. This epimerase was 
identified by bioinformatic analysis of genomic and transcriptomic databases. The 
next enzyme in the pathway, salutaridine synthase (SalSyn) exhibited 
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N-glycosylation, which resulted in reduced activity of the enzyme. Protein engi- 
neering was used to create a chimeric SalSyns with different N-terminal ends from 
cheilanthifoline synthase (CFS), a plant P450 enzyme which was heterologously 
expressed in yeast with high activity, to prevent N-glycosylation. Additionally, the 
codon-optimized salutaridine reductase (SalR) and salutaridinol acetyltransferase 
(SalAT) homologues from different Papaver sp. were compared. The best combi- 
nation of the four engineered enzymes in module VI included: P. bracteatum 
DRS-DRR, PbDRS-DRR; yeast codon-optimized P. bracteatum N-terminal variant 
SalSyn, yEcCFS1-83-yPbSalSyn92-504; yeast codon-optimized P. bracteatum 
SalR, PbSalR; and yeast codon-optimized P. somniferum SalAT, PsSalAT. The 
strain containing all 6 modules (with 24 genes cassettes) integrated in the chromo- 
some was able to produce 6.4 g/L of thebaine, the first morphinan alkaloid in the 
biosynthetic pathway. Then the pathway was extended to produce hydrocodone by 
introduction of a seventh module containing thebaine 6-O-demethylase (T60DM) 
and morphine reductase (MorB) (Fig. 4). The final strain harbored 26 genes in 
7 modules and produced 0.3 pg/L of hydrocodone from glucose for the first time, as 
the poppy cannot produce this compound [139]. The levels of opioids obtained in 
this work do not support industrial implementation as one dose of this drug requires 
the fermentation of thousands of liters [139]. More engineering studies are therefore 
needed to increase the production levels. However, this study demonstrated the 
potential of synthetic biology and metabolic engineering to design organisms 
beyond the limits of nature. 


5.2. Production of Fuels and Chemicals 


Increasing energy demand is pushing to the limit the use of non-renewable fossil 
fuel sources. Thus, there is a special interest in developing microbial cell factories 
able to produce fuels and alternative petroleum-derived chemicals. Some recent 
examples are the microbial synthesis of isobutanol, isopentanol, 2-methyl-1-buta- 
nol [129], n-butanol [122], ethanol [69, 146], (2R,3R)-butanediol [147], fatty acids 
[36, 148], and fatty acid ethyl esters (FAEE) [149]. 

Baker’s yeast S. cerevisiae was engineered for the production of FAEE. First, a 
wax ester synthase (WS), responsible for the synthesizing of FAEE from acyl-CoA 
and ethanol, was introduced. Five different WSs from different organisms were 
evaluated. The WSs from Marinobacter hydrocarbonoclasticus DSM 8798 (WS2) 
allowed the highest FAEE titers of all five, 6.3 mg/L [150]. The synthesis of acyl- 
CoA requires acetyl-CoA, an essential intermediate metabolite involved in several 
pathways. Thus, acetyl-CoA availability could hinder the FAEE production. Two 
different strategies were used to increase acetyl-CoA levels in the cytoplasm. The 
first was the introduction of an ethanol degradation pathway to redirect the carbon 
flux to the synthesis of acetyl-CoA. This pathway consisted of the endogenous 
alcohol dehydrogenase 2 (ADH2) and acetaldehyde dehydrogenase (ALD6), and a 
mutated variant of the acetyl-CoA synthetase (ACS ") from Salmonella 
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enterica which cannot be acetylated. These three enzymes were overexpressed in a 
high-copy plasmid together with WS2. The _ yield obtained was 
408 + 270 pg gCDW ', which is three times the productivity of the strain carrying 
only the WS2. However, the reproducibility was compromised, probably because of 
variations related to the large high-copy plasmid. To circumvent plasmid number 
fluctuations between replicates, the WS2 was detached from the ethanol degrada- 
tion pathway in a different plasmid. Then the FAEE productivity was significantly 
improved by 2.7-fold [149]. Integration of five or six copies of ws2 in yeast 
chromosome increased the FAEE titer more than fivefold compared with its 
plasmid-based counterpart [151]. The second strategy to increase cytosolic 
acetyl-CoA levels and NADPH cofactor levels was to introduce a heterologous 
phosphoketolase (PHK) pathway by expressing xpkA (encoding xylulose-5-phos- 
phate phosphoketolase) and ack (encoding acetate kinase) from A. nidulans. 
Replacement of ack by pta (phosphotransaketylase) from Bacillus subtilis that 
catalyzed the direct conversion of acetyl phosphate into acetyl-CoA was also 
evaluated (Fig. 5). Both PHK pathways, xpkA/pta and xpkA/ack, together with the 
integration of ws2, increased the production of FAEE by 1.6- to 1.7-fold (4,670 and 
5,100 pg FAEE gCDW_'), compared with the strain with only ws2 
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engineering for FAEE 
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[149]. Increasing precursor and cofactor levels increased the FAEE production, 
although further engineering is needed to achieve higher FAEE levels for commer- 
cial applications. 


6 Conclusions 


Many attempts have been made to establish microbial platforms for the production 
of valuable compounds. Although there are some commercially successful exam- 
ples of microbial production of bio-based chemicals on an industrial scale, there are 
still a number of challenges remaining. Recent advances in synthetic biology and 
metabolic engineering have enabled the production of a wide range of chemicals in 
heterologous hosts. The examples described in this chapter have identified and 
overcome a variety of bottlenecks that can arise when a heterologous pathway is 
introduced into a host microorganism. However, when a bottleneck was bypassed a 
new one emerged. Iterative cycles of optimization are needed to achieve an efficient 
pathway, which can be tedious and time-consuming. This highlights the need for 
new approaches to expedite the process. It is anticipated that the increasing 
genomic, metagenomic, and metabolic information available can permit the devel- 
opment of accurate computational algorithms that can eventually help predict 
efficient biosynthetic pathways. Improved pathway designs combined with new 
experimental tools are expected to reduce efforts and to facilitate pathway con- 
struction and optimization. It is envisaged that in future years the number of 
chemicals efficiently produced in microbial platforms will increase dramatically. 
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Synthetic Biology for Cell-Free Biosynthesis: 
Fundamentals of Designing Novel In Vitro 
Multi-Enzyme Reaction Networks 


Gaspar Morgado, Daniel Gerngross, Tania M. Roberts, and Sven Panke 


Abstract Cell-free biosynthesis in the form of in vitro multi-enzyme reaction 
networks or enzyme cascade reactions emerges as a promising tool to carry out 
complex catalysis in one-step, one-vessel settings. It combines the advantages of 
well-established in vitro biocatalysis with the power of multi-step in vivo pathways. 
Such cascades have been successfully applied to the synthesis of fine and bulk 
chemicals, monomers and complex polymers of chemical importance, and energy 
molecules from renewable resources as well as electricity. The scale of these initial 
attempts remains small, suggesting that more robust control of such systems and 
more efficient optimization are currently major bottlenecks. To this end, the very 
nature of enzyme cascade reactions as multi-membered systems requires novel 
approaches for implementation and optimization, some of which can be obtained 
from in vivo disciplines (such as pathway refactoring and DNA assembly), and 
some of which can be built on the unique, cell-free properties of cascade reactions 
(such as easy analytical access to all system intermediates to facilitate modeling). 


Keywords Cascade reaction, Combinatorial optimization, DNA assembly, 
Rational optimization, Scaling, System assembly 


Contents 

1 WROGCHON ss asccccicntaccncsvenuie aoxastedenmesbaixanntcupleatadanedabeinl oes cemmneaaepeaies 118 

2 Recent Examples of Cascade Reactions. :acaiccsdiseiscieiiese icpianeisedse diese sence 120 
2.1 Cascade Reactions with Puritied Hozvines sicccccccrnccdecaetnusvereerateadartepeane 120 
2.2 Cascade Reactions Using Cell-Free Extracts (CFXS) ..........0.cceceecernenneneens 123 


G. Morgado, D. Gerngross, T.M. Roberts, and S. Panke () 
Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, 
Mattenstrasse 26, 4058 Basel, Switzerland 

e-mail: sven.panke@bsse.ethz.ch 


118 G. Morgado et al. 


ao egsenibling Cascade RG ncnins cig casa tari dcigacsedenadasnacneacsenenaeena teak oeaeaasieunns 124 
SA - el PUMIEAMDI. 5ccacase aaetoie Gemeanie Hee M eam neamentaumnamneeeaae 124 
So OMY TAC RIND 20s. srnir nds aarrasihins aves hbase ete biiee cae Nee aaa ep eNS 124 
35 VOUSiMININE CPAS ce Acsatedangewitearnanenaron ned emanate Aa eeIear mere (25 
SA) GATING  ccncccrvacenarinehentnkken seiextaddabaciewenkdianaacduicebsareiioendeesens 126 
4  Bnecding Reacion Caseades : ainsi icksaie widedshedioutbidekddsG sake tetaubedcemeneiues oied 126 
4.1 Computational Design of Novel Reaction Pathways ................cc cscs eeeeee eee 129 
4.2 Controlling and Optimizing System Composition ..................0.cc eee eeeee eens 130 
Aes: NAC ASS CII 4.5 ccrescgecnearademaceas er asearenian eet adaneermaneh mae maIGe es 134 
o--, MIMANY cach entacieaiueanten nn qubenenucieaiane a etinebweiacaueeenenadieosneatad 138 
Rete nG es i aude iniircnie noeeh bid eta dare pcntaveadia ier uatokinbaedabea pace hee eee 138 


1 Introduction 


Biocatalysis — the catalytic conversion of a chemical compound by an enzyme — has 
made major contributions to the development of the bulk chemical, fine chemical, 
and pharmaceutical industries [1]. Even though it has become possible to use 
enzymes and homogeneous catalysts concomitantly [2], biocatalysis is usually 
used in isolation as one or a few steps in a chemical sequence. However, enzymes 
have one unique advantage over chemical catalysts, which is the similarity of 
reaction conditions — a large fraction of enzymes evolved to operate under the 
same set of environmental conditions: aqueous media, neutral pH, and ambient 
temperature. This enables — in principle — the easy installation of systems of 
enzyme reactions, in which multiple reactions are going on at the same time in 
the same vessel and thus enables large molecular modifications or the exploitation 
of a larger set of reactions, as thermodynamically less favored reactions can be 
combined with thermodynamically favored reactions to obtain high yield. 

Of course this is also the operating principle of the metabolism of the living cell, 
and in fact cells excel at the generation of an amazing molecular diversity from only 
a few starting materials [3-6]. However, in terms of applications, cell-free systems 
feature a number of advantages: they do not suffer from additional mass transfer 
barriers such as cytoplasmic membranes, they can often handle non-conventional 
solvents better, they do not suffer to the same extent from toxic effects or 
unproductive reactions with starting material, intermediates or products, and finally 
they are easier to control, as the composition of the system is under the control of 
the operator (Fig. 1). In fact, such biocatalytic systems or “cascade reactions” have 
been in use for synthetic purposes for quite a long time, in particular in the fields of 
cofactor regeneration [7], the production of monosaccharides [8, 9] or activated 
monosaccharides [10], oligosaccharides [11], or enantiomerically pure compounds 
in high yield [12], both in the academic and the industrial domains. Here, we 
discuss the emerging synthetic biology of such cascade reactions, specifically 
some recent examples of cascade reactions and methods to design, implement, 
and optimize them. 
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Fig. 1 Design and Implementation of multi-enzyme cascades in fine chemical production. (a) To 
make a given product from a desired starting material one must choose whether to make the 
product via chemical or biological means. (b) If the biological route is chosen, the next step is to 
identify enzymes that could synthesize the chemical of interest and to choose parts and design 
constructs accordingly. Here, to obtain specifically the desired enzymes, the host strain can be 
engineered by inserting the sequence for purification tags into genes encoding desired proteins 
(e.g., Hisg-tag, an epitope for antibody recognition, or a binding domain to arrange enzyme to a 
given scaffold), or by introducing mutations into target genes to confer thermostability to allow 
enrichment of thermostable enzymes during heat treatment. Other means of engineering include 
up or down regulation or deletion of genes, for example the knockout of enzymes catalyzing sink 
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2 Recent Examples of Cascade Reactions 


Obviously, one of the laborious elements of implementing cascade reactions is that 
several enzymes have to be used and obtaining enzymes can be a very laborious 
activity, including gene cloning, overproduction, and purification. Consequently, 
different approaches have been developed that either work with purified enzymes, 
thus enjoying a maximum of control over the system, or reduce the effort that goes 
into assembly in a variety of ways. We discuss examples for different approaches. 


2.1 Cascade Reactions with Purified Enzymes 


A natural field of application for cascade reactions is the fine chemical or pharma- 
ceutical industry, where additional effort (e.g., protein purification) might be a 
minor inconvenience when compared to system control (including reproducibility, 
control over yield, or optical purity). An important class of compounds in the field 
of bioactive substances is the isoprenoids, an inexhaustible source of natural 
products [13]. An essential element of all isoprenoid pathways are the steps from 
a starting material — such as phosphoenolpyruvate (PEP) — to a central intermediate 
in isoprenoid synthesis — such as isopentenyl-pyrophosphate (IPP) or dimethylally] 
pyrophosphate (DMAPP) — along one of two possible pathways, the mevalonate or 
the methylerythritol-4-phosphate pathway. The route along the mevalonate inter- 
mediate was implemented as an 11 enzyme pathway to DMAPP including the steps 
to recycle ATP, NADH, and acetyl-CoA [14]. Remarkably, a 100% yield on the 
carbon derived from PEP was achieved, and the pathway could be expanded by two 
more enzymes to produce isoprene. Clearly, making the bulk chemical isoprene 
from PEP is not a meaningful concept for large-scale chemicals, but it would be for 
a number of bioactive compounds, and the achieved efficiency was indeed remark- 
able. A somewhat shorter version of this pathway was implemented starting from 
mevalonate in four steps to DMAPP and then in three further steps to 
amorphadiene, a precursor to the antimalarial drug artemisinin [15]. 

Another important class of compounds as intermediates for drugs is saccharides. 
As pointed out above, there is a long history of using cascade reactions to synthe- 
size monosaccharides or activated monosaccharides as cascade reactions, for 
example to obtain unnatural saccharides [8] or as precursors to antiviral drugs 
[16]. Along these lines, the nucleoside analog 2’,3’-dideoxynosine (didanosine), 


Fig. 1 (continued) reactions or the upregulation of genes encoding enzymes involved in bottle- 
neck reactions. (c) This is coupled with the design of the process for generating streamlined cell- 
free extracts or purified proteins. (d) Once conditions and constructs have been chosen, the 
resulting strains can be used for production of the enzymes required for production of a desired 
chemical 
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an antiviral precursor, was produced from dideoxyribose using a five-enzyme 
cascade reaction. Three of the enzymes were required to produce the didanosine 
and two to recycle ATP from PEP. The pathway was optimized by directed 
evolution of each of the three main pathway enzymes, which led to a substantial 
increase in nucleoside production selectivity and productivity. Interestingly, it also 
allowed shortening the pathway to a four-step cascade, as a mutant version of one of 
the pathways had a changed selectivity and allowed the elimination of an isomerase 
from the main pathway [17]. p-Fagomine, an iminosugar with antihyperglycemic 
effect, can also be obtained in a multi-step one-pot reaction consisting of a four-step 
cascade and a subsequent separate chemocatalytic step: in the cascade, PP;-based 
phosphorylation of glycerol by an acid phosphatase is followed by oxidation of the 
resulting glycerol phosphate L-glycerol-3-phosphate oxidase (GPO) to obtain dihy- 
droxyacetone phosphate (DHAP) with concomitant inactivation of the side-product 
hydrogen peroxide by catalase [18]. This is an elegant solution of the DHAP 
synthesis problem, which has been investigated many times in the past, as DHAP 
can act as an aldol donor for a variety of enzymes which allow the production of 
vicinal diols of complementary diastereoselectivity [19]. In the present cascade, 
DHAP was also used to produce the immediate precursor to p-fagomine by addition 
of an aldehyde acceptor and using a fructosebisphosphate aldolase. 

Finally, on a more preparative scale, D-psicose, a potential replacement for 
traditional sugar, was produced from sucrose in a three-step cascade employing a 
hydrolase and two subsequent enzymatically catalyzed isomerization steps 
[20]. The resulting equilibrium problem was solved by operating the cascade 
integrated with a continuous chromatography step. 

Cascade reactions were even used to produce antibiotics, such as the polyketide- 
based bacteriostatic agents enterocin and wailupemycin. They were generated fully 
in vitro by an 11-membered enzyme cascade using malonate and benzoate as 
starting substrates and reconstituting a polyketide synthase [21]. 

Although a number of these cascade schemes were carried out merely on an 
analytical scale, such schemes also play an important role in the design of high- 
yield reaction schemes for the synthesis of optically pure intermediates for phar- 
maceuticals. These cascades are typically shorter and therefore also easier to 
optimize and scale. Recent examples include the production of berbines from 
racemic benzylisoquinolones by employing an enantiospecific berberine bridge 
enzyme and the similarly enantiospecific monoamine oxidase-catalyzed oxidation 
of the unwanted enantiomer to a prochiral precursor, which can again be converted 
in situ to the racemate [22]. Likewise, a three-enzyme cascade was used to improve 
the optical purity of 2,5-disubstituted pyrrolidines. Here, an asymmetrically 
substituted diketone was first converted by an enantioselective @-transaminase to 
an amine by reductive amination, coupled to removal of the side-product lactate by 
two further enzymatic reactions. The amine formed a substituted pyrroline after 
cyclization and was then converted without diastereoselectivity to a diastereomeric 
mix of 2,5-disubstituted pyrrolidines. However, by integrating an enantiospecific 
monamine oxidase, one of the pyrrolidine diastereomers from the mix could be 
re-oxidized to the pyrroline, which led to a steady enrichment of one pyrrolidine 
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diastereomer, up to a final diastereomeric excess of 99% [23]. Such examples of 
resolving enantiomeric or diastereomeric mixes are complemented by cascades in 
which prochiral starting materials can be converted into diastereomers with some 
flexibility in terms of stereoconfiguration, as exemplified by a two-enzyme cascade 
for the formation of either norpseudoephedrine or norephidrine [24]. 

Cascade reactions are also explored outside the fine chemical/pharmaceutical 
realm, in particular in the energy domain, underlining the great potential of such 
systems and also the need for ensuring that the cascades are operating optimally. As 
discussed further below, problems of providing cascades of multiple enzymes can 
be addressed by employing thermophilic enzymes. However, moving to in vitro 
schemes allows moving to chemical strategies which are more suitable than those 
realized in cells. For example, a cascade comprising 11 enzymes was implemented 
for the production of ethanol and isobutanol from glucose [25]. The cascade is an 
optimized version of the Entner—-Doudoroff pathway requiring only one cofactor, 
NAD*, and not containing any phosphorylated intermediates. Depending on the 
desired product, the initial four-step cascade (from glucose to pyruvate) is 
supplemented by a two-step cascade to ethanol or a four-step cascade to isobutanol. 
The pathway is balanced in view of NAD* reduction and NADH oxidation, and 
resulted in yields on glucose of more than 50% for both products. 

An alternative biofuel or polymeric intermediate, 1,3-propanediol, was available 
from glycerol in a three-step cascade which used hydrogen to close the redox 
balance [26]. However, more frequently, hydrogen is a target of cascade reactions 
which aim to provide it from renewable resources, either directly [27-29] or as a 
reducing equivalent [30]. At the core of these complex cascades with 10-12 
enzymes lies a smart combination of the enzymes from glycolysis and the 
pentose-phosphate cycle to convert glucose and water to hydrogen (through 
NADPH) and COs, allowing quite impressive yields beyond 95% for the case of 
conversion of xylose to xylitol as a biofuel precursor. 

In terms of energy transformation, enzymatic cascade reactions are applicable 
not only to the synthesis of biofuels or biohydrogen but also to the generation of 
electricity. The principles outlined above can also be applied to the transfer of 
electrons to an electrode, generating sugar-based biobatteries of considerable 
energy-storage densities. As an illustration, a cascade of 13 enzymes was used for 
the complete oxidation of maltodextrin to CO, and water, in the process donating 
electrons to NAD*, which in turn transferred them via diaphorase to a vitamin- 
based electron mediator inside an aerated fuel cell [31]. 

Finally, enzymatic cascade reactions are also used for the formation of mono- 
mers for bulk chemical applications, such as lactic acid from glucose as a renewable 
starting material. Lactic acid production is straightforward with standard glycolysis 
expanded by a lactate dehydrogenase to convert pyruvate to lactate. Standard 
glycolysis would, however, not be balanced in terms of cofactors (netting 2 ATP 
per consumed glucose), and therefore this specific cascade was equipped with a 
non-phosphorylating glyceraldehyde 3-phosphate dehydrogenase, resulting in a 
balanced ten-enzyme cascade with a yield of lactate on glucose of up to 100% 
[32]. This core cascade was later further extended by another seven enzymes for the 
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production of n-butanol with a molar yield of 82% [33] or by only one additional 
enzyme to produce malate [34]. 


2.2 Cascade Reactions Using Cell-Free Extracts (CF Xs) 


Despite the advantages of cascades of purified enzymes, their laborious implemen- 
tation might prevent their use. This prompts the question of whether enzymes 
would actually need to be separated from the cell-free extract (CFX) from which 
they are obtained after cultivation of the host organism, and in fact many enzyme 
cascade reactions are implemented with CFXs or perforated cells [25, 35]. This is 
obviously most relevant for enzymes that are intracellularly produced, for example, 
in the Gram-negative bacterium Escherichia coli, which is in practical terms still a 
very widely distributed method. Using recombinant enzymes as part of a CFX also 
has the potential benefit that additional enzymes in the CFX because of the 
cultivation can also be used for the cascade reaction. In fact, this principle has 
been widely exploited in cell-free protein synthesis and the synthesis of activated 
mono- and of oligosaccharides. In the former method, central catabolism, oxidative 
phosphorylation, transcription, and translation machineries were exploited to syn- 
thesize proteins [36], with a broad variety of applications [37]. In fact, CFX-based 
protein synthesis was used to produce a variety of biopharmaceuticals such as 
vaccines against influenza [38] and lymphoma [39], antibodies [40, 41], cytokines 
[42], and natural [43] and synthetic viruses [44, 45]. 

In saccharide synthesis, the surviving metabolism of perforated cells was used to 
regenerate cofactors and provide starting materials by conversion of orotic acid and 
glucose [10]. Of course, the additional functionality can also have negative effects. 
For example, the intrinsically complex nature of CFXs makes it difficult to achieve 
reproducibility. However, careful optimization can reduce the corresponding 
problems [46]. 

Interestingly, a conceptually novel enzyme cascade to DHAP was also 
implemented, starting from CO, and employing a novel, computationally designed 
formolase to convert formaldehyde into DHAP [47]. Even though this cascade was 
not balanced with respect to cofactor regeneration, it provides an interesting 
perspective on the integration of computationally designed enzymes (which bring 
novel reactions to biochemistry) into in vitro pathways. 

Bioactive compounds were also synthesized with CFXs, specifically to confirm 
the biosynthesis of specific natural compounds such as the antitumor agent 
azinomycin B [48]. In this study, cell lysates from Streptomyces sahachiroi were 
sufficiently powerful to complete full one-pot biosynthesis of both naphthoate 
(a known pathway intermediate) and azinomycin B itself. This was a remarkable 
achievement given the chemical complexity of both compounds. The authors also 
tested a battery of inhibitors and amino acids to clarify the substrate and cofactor 
requirements within the pathway and gained some insight into the mechanism of 
azabicyclic ring formation, which is believed to arise from ornithine. 
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3 Assembling Cascade Reactions 


As already mentioned, using enzyme cascades instead of, for example, living cells 
entails the additional effort of assembling the cascade. This means at least perfo- 
rating cells or producing cell lysates, and can go as far as purifying enzymes and 
then combining them to the desired system. Although perforating cells has turned 
out to be useful even on an industrial scale [11], we do not discuss it further here but 
rather refer to seminal reviews that summarize efforts nicely [49, 50]. A number of 
alternative strategies are discussed below (Fig. 1). 


3.1 Heat Purification 


A straightforward and cheap method to purify enzymes to a useful extent is to rely 
on recombinant thermophile enzymes in a mesophilic host, such as FE. coli. Here, 
cell lysates are prepared after the induction of expression from recombinant genes 
and a heating step is applied, during which most native proteins are deactivated and 
precipitated, but not the heterologous enzymes. An interesting side effect of this 
approach is that an increase in enzyme thermostability is often associated with an 
increase in process stability, which is of course beneficial and often essential to 
process economy [51, 52]. On the other hand, the specific activity of thermophilic 
enzymes tends to be optimal under those environmental conditions which are 
optimal for the host and which might or might not coincide with the optimal 
temperature for operating the cascade reaction. For example, the previously 
discussed DHAP is a rather labile product of a cascade reaction and producing it 
at higher temperatures would not be favorable. Consequently, processes relying on 
thermostable enzymes are particularly useful if the actual cascade reaction is also 
expected to operate at higher temperatures, for example, to prevent microbial 
contamination in a large-scale process, as would be the case for production of 
biofuels or rare sugars. In agreement with this, the already mentioned cascade 
reaction for the formation of ethanol or isobutanol was assembled from such 
thermostable enzymes and the process operated at 50 °C [25]. Similarly, the lactate, 
malate, and butanol production cascades discussed above were assembled from 
thermophilic enzymes which were separately cloned in E. coli strains and then 
prepared by heat treatment [32-34]. 


3.2 Affinity Tagging 


An alternative that allows purifying multiple proteins in only one step is systemat- 
ically equipping cascade members with tags that allow affinity purification of 
another form of separation, such as precipitation [53]. However, purification is 
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often either expensive (e.g., when considering the popular six-histidine tag 
(His-tag), which requires adsorption on Ni-nitrilotriacetic acid-coated surfaces) or 
associated with constructing protein fusions with large domains, which enforce 
purification of a substantial part of unwanted material. Inactivation of the target 
enzyme by the tag may also occur. Nevertheless, a number of studies made 
extensive use of the His-tag, including the biosynthesis of UDP-galactose through 
a seven-enzyme cascade reaction, in which all the enzymes carried a His-tag 
[54]. After purification, the enzymes were also immobilized on Ni-covered agarose 
beads and a higher production yield (~50%) was observed compared to the free 
protein counterpart. Another example of exhaustive usage of His-tags was provided 
by the reconstitution of complete biosynthetic pathways for purine [55] and pyrim- 
idine [56] from glucose, ammonia, carbonate, creatine phosphate, «-ketoglutarate 
(and, for purines, serine) as one-pot cascades comprising 28 and 18 enzymes, 
respectively, including up to five cofactor regeneration cycles. Along the same 
lines, the approach was used to provide an alternative to the classical CFX-based 
cell-free protein synthesis systems already mentioned by providing all protein 
elements in a purified form. For that, 38 essential genes were extended (distributed 
over multiple strains) to include the sequence for the His-tag [57]. This initial step 
was facilitated by applying oligonucleotide-mediated mutagenesis [58], which 
allows the expansion of genes directly on the chromosome. 


3.3 Streamlining CFXs 


When considering one of the main disadvantages of CFXs — increased complexity — 
then one alternative to purifying enzymes from complex CFXs is simplifying the 
CFX, for example by removing known interfering functions to prevent the con- 
sumption of starting material, intermediates, or products. Over the years, many 
interfering functions, for example in cell-free protein synthesis [59] or saccharide 
synthesis [9], were identified and removed. This was also done recently for another 
attempt at addressing the already discussed DHAP synthesis problem: DHAP can 
be produced from glucose in four steps by means of standard glycolysis plus an 
enzyme such as glucokinase that allows the phosphorylation of glucose with ATP. 
ATP can be regenerated by employing the lower part of glycolysis (five enzymes) 
and lactate dehydrogenase (to regenerate NAD*). However, in CFXs ATP is 
degraded to ADP and AMP and then hydrolyzed to adenine and ribose phosphate 
by AMP nucleosidase. Deleting the corresponding gene substantially improved 
ATP regeneration [60]. 

Gene deletion is straightforward and a variety of methods are available to 
implement it [61], but it cannot be used if the interfering function is either essential 
or of major importance for normal growth behavior. Then conditional removal 
becomes interesting, as exemplified for in vivo conditions for improving myo- 
inositol formation after inducing degradation of a key enzyme of glycolysis 
[62]. One suitable strategy to achieve this is to tag proteins genetically with a 
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“degradation tag”, in fact a copy of the sequence that E. coli cells use to mark 
proteins, whose translation was prematurely terminated, for degradation by the 
ClpXP machinery [63] with the help of a adaptor protein called SspB. The intra- 
cellular level of proteins whose functional half-life was shortened in this manner 
can then be controlled either by stopping induction of the corresponding gene and 
relying on accelerated clearing of the gene product from the cytoplasm, or by 
inducing the adaptor and in this way accelerating degradation [62]. By timing the 
preparation of CFX suitably, such strategies can also be used for streamlining 
CFXs. Similarly, proteins could be equipped with specific proteolysis tags and 
the tagged proteins removed by selective proteolysis. This strategy is well- 
established for TEV-directed intracellular [64] or extracellular [65] hydrolysis. 


3.4 Scaffolding 


Multistep cascades generate intermediates which need to reach high concentration 
levels before the subsequent enzyme can operate under conditions of maximum 
rate, requiring the cascade to operate at a high overall concentration of chemical 
compounds, which might interfere with enzyme stability. This problem can be 
reduced by providing the separate enzymes of a cascade as part of a spatially 
organized complex on a scaffold rather than as independent units. In such com- 
plexes, apparent concentrations of starting materials and intermediates are higher as 
the active sites of two enzymes are in close proximity, so faster catalytic rates are 
reached at (in terms of averaging across the reactor volume) lower concentrations. 
Organized complex formation can be achieved by fusing enzymes to proteins that 
bind to a suitable scaffold, made of either protein [66], DNA [67], or RNA 
[68]. Exploitation for cascade reactions has focused on the highly versatile 
cellulosome scaffold [69] which allows interactions with different fusion partners 
and was exploited to scaffold a three-step cascade to produce fructose-6-phosphate 
from glyceraldehyde 3-phosphate [70]. 


4 Encoding Reaction Cascades 


A crucial aspect of cascade reactions is that they operate as a system in which 
different feedback mechanisms lead to behavior that is not necessarily intuitively 
accessible. This might be on the level of productivity, where it turns out that the 
intermediate from a downstream reaction acts as an inhibitor of an upstream 
reaction, as is often the case in metabolic pathways. This might, however, also be 
across different levels of the implementation process, when it turns out that by 
assembling genes into an operon the specific combination of DNA sequences 
inadvertently introduced an additional promoter structure that changes expression 
behavior [71] and thus influences the composition of the cascade after heat 
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purification. One aspect of this system concept is that scaling is far from trivial for 
cascade reactions, as problems with single elements that are relatively straightfor- 
ward to overcome in a one- or two-member reaction can aggregate in a system to an 
extent that it becomes difficult to deconvolute and address them efficiently. In other 
words, setting up cascade reactions can benefit from a rational design approach, in 
which the various steps of the implementation proceed along a rational, possibly 
standardized, and computationally supported sequence of steps (Fig. 1). Of course, 
this design process does not yet exist [72, 73]. However, across the different steps a 
variety of tools have been introduced which provide valuable assistance, and we 
discuss the design tools and strategies for the optimal assembly of cascade reactions 
in the following sections. 

In these sections we follow the process from selecting the enzymes for the 
pathway to the final construction of a DNA molecule (or a few DNA molecules) 
that actually program the synthesis of the cascade into a bacterial cell (Fig. 2). For 
this, we rely heavily on the methods that have been implemented to set up in vivo 
pathways, for the simple reason that in a first approximation the required steps are 
very similar. We make this argument on two levels. The first level is enzyme 
selection. When considering in detail the examples for successful cascade reactions, 
it becomes clear that the long cascades in particular exist in the majority of enzymes 
that were not particularly engineered for a specific cascade. In other words, 
although the engineering of single enzymes remains crucial for the success of the 
overall cascade, the “backbone” usually consists of enzymes used in a function well 
known in standard biochemistry. In this contribution we therefore acknowledge the 
crucial roles that computational enzyme design [74] and directed evolution [75] 
play in the conversion of crucial reactions into cascade reactions, but we focus on 
the systems aspect and discuss tools that allow the assembly of systems of enzymes 
rather than the design of single enzymes. 

The second level is system composition. Ultimately, the performance of a 
cascade reaction depends on its composition. As an in vitro pathway, the produc- 
tivity of the pathway is subject to metabolic control by the various members [76], 
which in turn is exerted through kinetic parameters such as substrate and enzyme 
concentrations, affinities, allosteric interactions, cooperative behavior, degradation 
constants, etc. [77]. When considering the different methods to assemble cascade 
reactions (except for scaffolding), it becomes clear that the composition of the 
cascade is determined by the genetic construct assembled to express the encoded 
genes. Furthermore, it seems reasonable to assume that it is preferable to produce a 
cascade reaction with only one cultivation, in which the cells synthesize all the 
different enzymes to optimal levels, over many separate cultivations, in which each 
cell overproduces only one enzyme. In other words, the task at hand is to assemble 
the genes of a cascade reaction in one or a few bacterial operons, tune the gene 
expression to the optimal level, and, if necessary, optimize the performance. 
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4.1 Computational Design of Novel Reaction Pathways 


An increasing number of computational tools are available to obtain the enzymes that 
formally allow to connect a starting material with a desired product through a set of 
already demonstrated or hypothetical reactions [5, 78-80]. Databases such as BRENDA, 
KEGG, MetRxn, MicrobesOnline, and SEED [81-85] allow the exploration of known 
organism-specific metabolic routes and navigation through non-native combinations of 
these individual reactions that lead from a starting metabolite to an anticipated destina- 
tion compound. This allows the quick identification of potential reaction networks based 
on known and well-described metabolites and enzymatic reactions. 

However, there are a number of challenges involved: a variety of pathways are 
often possible from starting material to product and the optimal path is not neces- 
sarily obvious. Sometimes the opposite is true and chemical conversions are 
required for which no biological counterpart is known. When the enzymes for the 
cascade reactions are produced in one host, then they can effectively be seen as an 
in vivo pathway, and possible intermediates and products of this artificial pathway 
might interfere with the reactions in a host organism. The selected enzymes (or the 
host’s enzymes) might display poor selectivity, leading to unanticipated conse- 
quences in the host. Finally, if CFXs are applied, cellular pathways might direct 
intermediates into unproductive side reactions [86]. These influences require tools 
that allow pathway enumeration together with evaluation of potential and possible 
consequences, and ideally inclusion of novel reactions, which are similar to but 
distinct from those already available in databases. 

A number of methods have been proposed to identify these types of de novo 
pathways, including enzymes which are supposed to catalyze the new reaction 
steps, based on similar known reactions. They all revolve around a concept of a 
formalized representation of enzyme-catalyzed reaction which allows abstraction 
from a specific substrate/product couple (to include novel reactions based on known 
reactions but with novel substrate specificity) and concomitantly mathematical 
representation to allow computer-supported network generation. BNICE [87], for 
instance, predicts pathways by combining pathways according to the first three of 
the four possible layers of the enzyme classification system [88], which identifies 
enzymes by reaction type but not by detailed substrate specificity. Effectively, this 
allows representation of starting materials, products, and enzymes/reactions by 
bond-electron matrices, which can be systematically transformed and existing 
reaction types can be applied to novel substrates. The system was expanded to 
include thermodynamic considerations for ranking [89] and additional layers in 
which predictions about particularly suited enzyme scaffolds and specific sugges- 
tions for enzyme engineering were included [90]. 

Alternatively, reactions that are supposed to be considered for generating the 
network of possible reactions connecting a starting material and a product can be 
formulated by a limited number of reaction rules, which represent a large fraction of 
reactions collected in a central database such as KEGG at a level below substrate 
specificity [91]. Enzymes are also associated with reaction rules, and in this way a 
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network of potential enzyme reactions is implemented and in a next step evaluated 
based on local similarity between molecules, similarity between entire structures, 
thermodynamic feasibility, pathway distance, and the network of the host organism. 
For scoring, the importance of the different elements can be calibrated with 
training sets. 

Finally, conversions of compounds can also be implemented by transformation 
of molecular signatures. Depending on the resolution with which molecules are to 
be represented, the molecular signatures can be computed at different heights, 
resulting in reaction networks of increasing size for decreasing height. Once the 
reaction space is defined in this way, pathways are ranked again according to 
thermodynamics, enzyme availability or selectivity, and product toxicity [92]. Com- 
binations and variations of these approaches which use different criteria for path- 
way scoring have also become available (Fig. 2) [93-97]. 

In summary, a number of tools to predict feasible pathways are available which 
integrate a large number of criteria into selecting an appropriate pathway and apply 
at least formal methods to suggest novel enzyme reactions that are required to 
obtain a pathway. These suggestions can then be followed up by more sophisticated 
methods, or alternatively novel reactions, such as computationally designed reac- 
tions, can be included in the definition of the network which is subsequently scored 
to identify the most promising pathway. Although criteria such as impact on the 
growth behavior might not be a prime concern for implementing cascade reactions, 
the integrated scoring criteria in general are very helpful for pathway selection. 


4.2 Controlling and Optimizing System Composition 


Practical implementations of novel reaction systems do not easily reach satisfying 
productivities and require optimization when progressing from an initial concept of 
a biocatalytic network eventually to an applied system creating economic value. As 
pointed out before, the crucial factor for the optimization is control over relative 
and absolute protein levels of each introduced enzyme [98]. Conceptually, there are 
two different approaches: optimization by rational or combinatorial approaches, 
and both have been implemented. 

Rational approaches include mostly straightforward analyses to identify rate- 
limiting steps by evaluating the effect of systematically increasing the concentra- 
tion of one of the cascade members. Even though the idea of the rate-limiting step 
can be misleading in pathways [99], the approach as such is frequently applied, 
particularly if the cascade members are available in purified form. For example, in 
the aforementioned formolase cascade for the formation of DHAP [47], titration 
experiments were used to identify rate-limiting steps. When the enzymes are 
understood well enough, kinetic modeling might be used to support the identifica- 
tion of the limiting step. For example, the production of hydrogen from cellulose 
was optimized using a model built on rate equations for the involved enzymes and 
adapting the kinetic parameters to the observed experimental data 
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[29]. Alternatively, statistical approaches can be followed: the seven-step cascade 
to produce amorphadiene from mevalonate discussed above was optimized under 
two-phase reaction conditions for several variables such as absolute and relative 
enzyme levels, different types and concentrations of monovalent ions and magne- 
sium ions, and the influence of pH [15]. The authors systematically evaluated 
16 different combinations of enzyme levels designed with a factorial orthogonal 
array and response surface methodology, and also identified the reactions consum- 
ing farnesyl pyrophosphate and producing amorphadiene as bottlenecks. 

A different rational approach was taken for the discussed cascade to produce 
lactate. Here, the authors conducted a series of initial experiments in which they 
expressed all cascade genes separately in a recombinant strain and determined 
mRNA level and specific productivity of the recombinant enzyme. In this way, 
they could obtain a rough indication of which cascade member would require most 
overproduction, and then they used this information for arranging the 
corresponding genes into an operon under the hypothesis that the first gene in an 
operon is the most heavily expressed [100]. The resulting CFX was indeed fourfold 
higher than when the same biomass was assembled from strains which 
overexpressed only one gene. 

Although all these approaches are straightforward and also effective, in their 
focus on the formation of the end product they do not fully reflect the systems 
character of the cascade and it remains unclear whether the improvements they 
suggest can be transferred to larger scales. That would be helped if the analyses 
included a more comprehensive record of concentrations, including those of the 
intermediates. This has become possible by integrating online mass spectrometry 
into a continuous reactor setup, which has allowed monitoring of the compositions 
of substrates, most of the intermediates, and products in response to additions of 
enzymes to optimize the formation of DHAP [101] (Fig. 3). After three rounds of 
optimization of the operon design, DHAP productivity was improved 2.5-fold. 

The alternative to rational approaches to system optimization are combinatorial 
approaches, which are meant to refer to approaches in which the performance space 
of a system is explored by a non-biased recombination of cascade elements and the 
subsequent identification of the best performing composition. Although this can 
also be done, in principle, with purified enzymes, our ever increasing proficiency in 
manipulating DNA molecules provides another straightforward approach which 
transfers the laborious implementation of combinatorial schemes from, for exam- 
ple, a robot (variation of enzyme concentrations according to a pre-programmed 
pipetting scheme) to chemistry (variation of enzyme concentration by varying 
transcriptional and translational signals). This coincides with the fact that control- 
ling intracellular protein level has been a main target in synthetic biology in recent 
years. 

The corresponding efforts were undertaken at various levels (Fig. 2), including 
efforts to standardize the construction process as well as the construction of suites 
of parts and tools to implement diversified protein production signals. The Registry 
of Standard Biological Parts (http://partsregistry.org) with its BioBrick™ standard 
for biological parts, for instance, provides a collection of genetically encoded parts 
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Fig. 3| Workflow for a semi-rational cascade optimization. A cascade (here for the production of 
dihydroxyacetone phosphate (DHAP) based on standard glycolysis with recycling of ATP and 
NAD*) is assembled in an enzyme membrane reactor, where it can be freely perturbed with 
substrates and compounds. In this example, the cascade is part of a cell-free extract. The effluent is 
immediately analyzed by mass spectrometry at a high time resolution (multiple measurements per 
minute). This way, not just the changes in concentration of the target compound (DHAP) but of the 
entire spectrum of compounds can be followed. This system can be used to add enzymes 
selectively to identify cascade compositions that operate with an improved productivity, which 
can be used to inform the change in expression signals of key genes, so that the next cell-free 
extract approximates more closely the previously identified best composition 


together with rules for physical composition and guidelines for functional compo- 
sition and characterization [102, 103]. Another community driven standard is the 
Synthetic Biology Open Language (SBOL) [104, 105], which is a data standard 
aiming toward facilitating design and exchange of novel biological systems. Vec- 
tors were also subjected to standardization efforts, leading to the Standard European 
Vector Architecture (SEVA) [106, 107], which provides rules for the construction 
and nomenclature of prokaryotic plasmids together with an online and a physical 
database of various characterized vector designs. 

The SEVA format allows a quick exchange of plasmid origins of replication, and 
the resulting changes in plasmid copy number (between | and several 100s [108]) 
and consequently gene dosage are a first important factor in governing protein 
levels [105, 109-111]. However, gene dosage only sets a baseline for DNA levels. 
Promoter activity controls the rate at which mRNAs are produced, and several 
attempts were undertaken to systematize the measurement of the somewhat poorly 
defined parameter “promoter strength” [112—117]. Ultimately, this resulted in the 
creation of several sets of promoters, which allow the tuning of protein expression 
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on the transcriptional level [116, 118-122]. Having access to characterized pro- 
moters with different strengths facilitated combinatorial screening approaches to 
optimize the expression levels of a multi-enzyme pathway [118, 119, 123]. 

Ultimately, large cascade reactions need to be encoded in oligo-gene operons, 
which calls for another layer of regulation at a finer resolution than the promoter, 
for example translational signals specific to each gene. Advances in biophysical 
models describing interactions of the ribosome with the 5’-UTR, that is, the 
ribosome binding site (RBS) involved in translation initiation, provide the means 
to control protein production over several orders of magnitude in a predictive 
manner [124—127]. On the basis of the Gibbs free energy difference between the 
folded mRNA and the assembled translation initiation complex between the 30S 
ribosomal subunit and the mRNA transcript, those thermodynamic models predict 
relative expression levels only based on the mRNA sequence. Consequently, 
providing a broad variety of protein synthesis signals can be achieved with reduced 
effort by using these models for RBS design, even if they are not accurate enough to 
allow direct design (Fig. 2). 

One of the advantages of targeting the RBS for optimization of expression levels 
is that large changes in protein translation result from only small changes in the 
RBS sequence [128], thus simplifying the generation of libraries covering large 
portions of the accessible protein expression space. Moreover, based on available 
model predictions, the RBSs can be forward engineered and evaluated in silico, 
allowing a focused search for optimal expression level combinations [129]. For 
instance, Zelcbuch et al. [130] used the forward engineering capabilities of the RBS 
Calculator [125] to define a small set of RBSs that potentially covers a large range 
of expression levels. Applying this set to a combinatorial library for a branched 
carotenoid biosynthesis pathway revealed a large diversity of carotenoid produc- 
tivities, in one instance outperforming previous pathway engineering efforts by a 
factor of four for the production of the industrially valuable compound astaxanthin. 
Nowroozi et al. [131] also included the use of the RBS Calculator predictions in 
their efforts to construct a combinatorial operon library of isoprenoid production 
pathways to improve the production of amorphadiene in E. coli. Again because only 
a few base pairs need to be changed, this method is also very suited to changing the 
expression level of genes located on the chromosome. For example, a synthetic 
Entner—Doudoroff pathway was introduced into the genome of E. coli [132] and the 
RBS signals were optimized in a combinatorial fashion using oligo-mediated 
recombineering-based combinatorial RBS screening to optimize the pathway’s 
operon, which led to an increased NADPH regeneration rate. Similar approaches 
of combinatorial modulation of RBSs within an operon-encoded pathway had also 
been previously applied [133-135]. 

Another method to introduce individuality into operons is to produce transcripts 
of different lengths from the same promoter by inserting transcriptional terminators 
of intermediate strength into intergenic regions, so that downstream recombinant 
genes are sometimes part of the transcript and sometimes not, again contributing to 
the control of the cellular protein level. On the other end of the scale, preventing 
read through from a heavily transcribed operon into downstream sections of the 
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genome or a plasmid is desirable and requires efficient terminators. This led to the 
development of a variety of suites of terminators to address such needs. Cambray 
et al. [136] developed a genetic architecture that enabled a reliable determination of 
terminator efficiencies. Chen et al. [137] used a similar approach to characterize a 
large library of terminators for use in synthetic systems. 

The tools mentioned above, combined with efforts to use DNA synthesis to 
remove known or opaque gene-internal regulatory sequences, are often considered 
the toolbox of pathway “refactoring” [138], in which a pathway of potentially 
diverse origins is taken out of its native regulatory context and recast into a format 
in which its performance is optimal from the point of view of the operator. Clearly, 
the criteria that apply to the in vivo setting in which this strategy is typically applied 
and to the in vitro setting of cascade reactions are very similar, so that refactoring is 
also a promising tool in cascade reaction optimization. 


4.3. DNA Assembly 


Once the different parts required for refactoring the pathway are available, they 
need to be assembled into one or a few operons [61]. Laboratory assembly of DNA 
has been around for several decades [139]. However, it has recently become faster 
and simpler, and has allowed for the construction of complex constructs 
[140]. Here, we contrast and compare some of the available methods. Current 
large-scale DNA assembly methods fall into three main categories: (1) those 
based on rounds of restriction digestion and ligation, and both (2) in vitro and 
(3) in vivo homology-based methods. 

Methods that fall into category 1 include BioBrick, BglBrick, and Goldengate 
cloning [141-143]. In general, these methods work by the PCR of modules 
(or parts) with primers containing the required restriction enzyme recognition 
sites at the 5’ end. PCR products are then purified, cut, and ligated into a vector 
conforming to the BioBrick, BglBrick, or Goldengate standard. In the case of 
BioBricks and Bg|Bricks standards it is critical that the 3’ end of part I is cut with 
a different enzyme than the 5’ end of part II, but that these two restriction enzymes 
yield compatible ends that, once ligated, no longer contain a restriction enzyme site. 
This allows for continual digestion and ligation of additional modules, which is 
amenable to the recursive methods of automation. However, one of the major 
disadvantages of the BioBrick and Bg|Brick methods is that a scar is formed at 
the junction of each module. In the case of BioBrick this is an 8-bp scar, limiting the 
use of BioBrick assembly to larger, non-coding regions. This issue has been 
improved upon with the BglBrick standard by utilizing Bg/II/BamHI restriction 
sequences that result in a 6-bp scar and the introduction of a glycine-serine 
dipeptide when placed into a coding region [141]. Furthermore, the assembly of 
the parts is performed sequentially such that assembly of ten modules would require 
ten individual rounds of cloning, which can be costly and time-consuming. How- 
ever, parallel assembly of parts (in multiple pots), the use of repeating parts 


Synthetic Biology for Cell-Free Biosynthesis: Fundamentals of Designing... 135 


(promoters, RBSs), the simplified design of sequential additions of parts by algo- 
rithms [144], and liquid-handling robotics [145] can help to minimize these 
burdens. 

Other in vitro methods, such as Goldengate cloning, can be performed in parallel 
and entirely avoid the introduction of a scar by using TypellS restriction enzymes 
which cut outside their recognition sequence [142]. This allows for assembly of 
multiple parts at once because the cut sequence can be unique to each module, 
allowing for ordered assembly of parts. As many as 68 parts have been successfully 
assembled by Goldengate cloning in three one-pot assembly reactions [146]. Similar 
to the other category 1 methods, Goldengate cloning requires that all parts being 
assembled are free internally of the enzyme recognition sites. In most cases this can 
be accommodated by introducing small changes in the nucleotide sequence of the 
modules to remove conflicts. However, if these sites occur in coding or regulatory 
regions, it can lead to changes in expression of proteins of interest. 

So-called sequence independent methods overcome these restrictions by using 
homology-based assembly, which involves the in vitro resection and annealing of 
homologous regions (category 2) or in vivo homologous recombination (category 
3). An advantage of these methods over category | methods is that it is unnecessary 
to alter the sequence of any of the parts being assembled. In vitro homology 
methods include circular polymerase extension cloning (CPEC) [147], sequence 
and ligase independent cloning (SLIC) [148], and Gibson or isothermal assembly 
(ITA) [149]. 

All category 2 in vitro methods require that the modules of interest end in 
sequences homologous to those on the ends of the neighboring modules. These 
“overlapping” modules are in general constructed by PCR with primers adding the 
homologous region to the 5’ end. All parts to be assembled are mixed together and 
incubated in an annealing mixture in which the homologous regions are exposed as 
single-stranded units and the opposing ssDNA strands are annealed. 

In the case of CPEC, a form of overlap extension PCR, this occurs by rounds of 
denaturing, annealing, and extension in which the annealed vector and insert use 
each other as a template until the construct is circularized. The final plasmid 
contains two nicks that are repaired upon transformation. Successful assembly by 
CPEC relies on the identical melting temperature (T,,) of overlapping regions of 
adjacent modules [147]. Up to four modules generating an 8.4-kb construct were 
combined by CPEC. Assembly of a greater number or larger versions of modules 
would likely require other cloning methods [150]. The advantage of CPEC over 
other in vitro assembly methods is that it uses common PCR reaction components 
and does not require expensive kits or enzymes not already found in most 
laboratories. 

In contrast to CPEC, SLIC and ITA use exonucleases to resect the dsDNA by T4 
DNA polymerase and T5 exonuclease, respectively, thus generating extended 
single-stranded regions to support the annealing process of the homologous regions. 
In SLIC, once the homologous regions are annealed, only one dNTP is added to 
arrest the exonuclease activity of T4 DNA polymerase [148]. This results in a 
construct with nicks or gaps on either side of each fragment which are repaired upon 
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transformation. In ITA, a similar principle applies but ligation is integrated into the 
in vitro step. In one step, T5 exonuclease, Phusion polymerase, and Taq ligase are 
mixed. The T5 exonuclease resects the dsDNA to allow annealing, whereas Phusion 
polymerase fills in the gaps and the ligase seals the nicks. In essence, the method 
relies on the balance of activity between the polymerase and exonuclease (with the 
exonuclease being ultimately heat-inactivated at the standard temperature for this 
step) [149]. SLIC and ITA can be used to generate constructs containing ten 
modules, can vary greatly in size, and as such are preferable over CPEC for the 
generation of complicated constructs with many parts. 

It is important to note that the USER method [151] and a hierarchical method 
similar to SLIC [152] are also very useful methods of ligation independent cloning; 
however they are not discussed in this chapter as they are not entirely sequence 
independent. 

The main disadvantages of category 2 methods lie in the requirement for 
homologous regions. It is important that these overhanging regions are free from 
secondary structures which could otherwise hamper their annealing to the neigh- 
boring fragments. If secondary structures are unavoidable, it may be beneficial to 
choose a method with a higher reaction temperature, as the proposed annealing 
temperatures are different for the three presented methods (SLIC: 20 °C, ITA: 
50 °C, CPEC:55-65 °C). 

Category 3 or in vivo homology methods take advantage of the inherent DNA 
repair and homologous recombination machineries of the yeast Saccharomyces 
cerevisiae. The parts to be assembled are transformed into yeast together with a 
shuttle vector containing a yeast origin of replication and selection marker and cells 
are plated onto selective media. As with other in vitro homologous methods, the 
modules are generated by PCR with oligonucleotides containing the overlapping 
regions. Modules consisting of double-stranded DNA are then transformed into the 
yeast and ssDNA regions are exposed by the yeast exonuclease. The exposed 
ssDNA is then bound by RPA (yeast single-stranded DNA binding protein) resolv- 
ing any secondary structures [153]; double-strand break repair mechanisms subse- 
quently join the homologous regions together, generating a plasmid that can express 
the selection marker [154]. To reduce false positives it is important to choose a 
selection marker that does not contain homologous sequences in the host strain. 
This method has been successful with up to 38 pieces at once [155]. 

Although easy to design and relatively simple to implement, one of the major 
disadvantages of the use of yeast assembly for library generation and then expres- 
sion in other hosts is the limited recovery of the plasmid DNA from the yeast cells 
(~1 ug of DNA per 10!°-10” cells [156]). Additionally, it is critical that the 
assembled construct does not result in toxicity in yeast. Furthermore, it requires 
that the vectors contain replication sequences and resistance markers for both hosts, 
or require sub-cloning into an appropriate vector after assembly, which would 
require that the final assembled fragment is free from the enzymes required for 
sub-cloning. 

For both category 2 and 3 methods it is critical that these homologous regions are 
unique to each part, otherwise unwanted assembly can occur leading to constructs 
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with parts lacking or assembled in an incorrect order. As such, the major advantage 
of category 1 methods over homology-based methods is that repeating elements can 
be easily used without any undesired fragment generation. 

In general, each method has inherent advantages and disadvantages, and to 
generate complicated assemblies it may be beneficial to combine various methods 
to achieve the desired construct. ITA and yeast assembly have been used in 
combination to clone the Mycoplasma genitalium genome [157], and ITA, a 
“scarless-stitching method,” and Goldengate were used in combination to refactor 
the nitrogen fixation pathway of Klebsiella oxytoca [138]. 

In summary, a great deal of effort has been invested in developing techniques for 
rapid, efficient, and easy assembly of desired constructs. Yeast assembly and ITA 
have been used for library generation of yeast plasmids to optimize production of 
xylose or components of the violacein pathway, respectively [118, 119, 158]. Over- 
lap extension PCR has been used to generate a library of the mammalian calmod- 
ulin central linker for expression and purification in F. coli [159]. Fragment 
exchange, which combines restriction digest with TypelIS restriction sites and 
annealing of homologous regions, has been used to screen for novel bioactive 
agents in E. coli [160, 161]. It remains to be seen whether these methods can also 
be used for efficient generation of libraries for production of desired products in 
E. coli. 

In addition to methods that assemble plasmids, genome-based efforts have also 
contributed to the engineering of strains that can be exploited for use in in vitro 
systems. Traditionally these methods have included random radiation or chemical 
mutagenesis followed by rounds of screening. However, recent advances in large- 
scale genome engineering allow for a directed approach. The method relies on the 
co-expression of the lambda-red recombination system from bacteriophage and the 
transformation of short DNA sequences bearing the desired mutation into FE. coli 
[162, 163]. These oligonucleotides can then act as Okazaki fragments on the 
lagging strand which introduce mutations during replication [58]. Efforts to auto- 
mate oligo-mediated recombineering were undertaken, which allow targeting of 
multiple sites at once in a relatively high-throughput manner. This has been used to 
generate strains for enhanced product production by modification of ribosome 
binding sites [58] or promoters [164]. Furthermore, it has been used to insert 
hexa-histidine tags to allow for facile purification of the components of an entire 
pathway [57]. Furthermore, the efficiency of such an approach can be increased by 
co-usage of CRISPR-Cas9 to remove unwanted parent genotypes, thus specifically 
enriching engineered sequences [165]. Together, the described in vitro and in vivo 
DNA assembly and engineering efforts can result in optimization of pathways to be 
used for in vivo, cell-free extraction or in vitro applications. 
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5 Summary 


In this chapter we have summarized a number of examples of cascade reactions for 
products with very diverse applications, ranging in scale from small (pharmaceu- 
tically active ingredients) to large (biofuels) and in nature from simple (molecular 
hydrogen) via stereochemically challenging (optically pure fine chemicals) to 
complex polymers. For all these examples, the attractiveness of the approach lies 
in the fact that multiple enzymes are brought together without undesired spatial 
separation in one vessel and under the same set of environmental circumstances, 
allowing the efficient performance of complex chemistry including the use of 
thermodynamically unfavorable reactions. This focus on multiple members of the 
reaction changes the nature of the operation to that of a system with emerging 
properties. Consequently, the methods applied to constructing and optimizing such 
enzyme cascade reactions need to be adapted, from identifying suitable members of 
such systems via assembling to optimizing them. Many of the required methods can 
be obtained from in vivo synthetic biology and its efforts of pathway refactoring. 
We have over the years become very good at manipulating DNA, and exploiting 
this for exploring the best performance of a cascade reaction by integrating its 
in vivo production with broadly different compositions seems a natural way to 
optimize cascades. However, the cell-free character also allows the introduction of 
novel elements such as advanced analytical strategies to track the performance of 
cascades. 

Clearly, cascades, in particular larger cascades, need to be scaled beyond the 
available few examples [11, 12] to demonstrate their ultimate value for (bio) 
chemistry beyond proof of principle. However, very promising approaches, for 
example using thermophilic enzymes, are available, which suggests that, ulti- 
mately, scaling issues can be overcome. Here, it might serve to recall that enzyme 
processes are among those bioprocesses with the highest product volume (produc- 
tion of high fructose corn syrup employing glucose isomerase, annual production 
volume of 10’ tons per year [166, 167]). This augurs well for the future of this 
promising approach. 
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Synthetic Biology of Polyhydroxyalkanoates 
(PHA) 


De-Chuan Meng and Guo-Qiang Chen 


Abstract Microbial polyhydroxyalkanoates (PHA) are a family of biodegradable 
and biocompatible polyesters which have been extensively studied using synthetic 
biology and metabolic engineering methods for improving production and for 
widening its diversity. Synthetic biology has allowed PHA to become composition 
controllable random copolymers, homopolymers, and block copolymers. Recent 
developments showed that it is possible to establish a microbial platform for 
producing not only random copolymers with controllable monomers and their ratios 
but also structurally defined homopolymers and block copolymers. This was 
achieved by engineering the genome of Pseudomonas putida or Pseudomonas 
entomophiles to weaken the B-oxidation and in situ fatty acid synthesis pathways, 
so that a fatty acid fed to the bacteria maintains its original chain length and 
structures when incorporated into the PHA chains. The engineered bacterium 
allows functional groups in a fatty acid to be introduced into PHA, forming 
functional PHA, which, upon grafting, generates endless PHA variety. Recombi- 
nant Escherichia coli also succeeded in producing efficiently poly 
(3-hydroxypropionate) or P3HP, the strongest member of PHA. Synthesis pathways 
of P3HP and its copolymer P3HB3HP of 3-hydroxybutyrate and 
3-hydroxypropionate were assembled respectively to allow their synthesis from 
glucose. CRISPRi was also successfully used to manipulate simultaneously multi- 
ple genes and control metabolic flux in E. coli to obtain a series of copolymer 
P3HB4HB of 3-hydroxybutyrate (3HB) and 4-hydroxybutyrate (4HB). The bacte- 
rial shapes were successfully engineered for enhanced PHA accumulation. 
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1 Introduction 


Polyhydroxyalkanoates (PHA) are a family of structurally diverse intracellular 
biopolyesters accumulated by many microorganisms [1—3]. Because of their similar 
properties with traditional petroleum-based plastics, PHA have been developed for 
applications in the packaging, medicine, pharmacy, agriculture, and food industries 
[4-6]. Compared with other well-known biodegradable or biobased polymers with 
less CO, emission, such as polylactide (PLA), PHA have much wider diversity in 
monomers with over 150 structural variations reported [7, 8]. 

Based on monomer lengths, PHA monomers are divided into short-chain-length 
(scl) consisting of 3-5 carbon atoms, and medium-chain-length (mcl) of 6-14 
carbon atoms (Fig. 1) [8, 9]. Based on the composition of the monomers and their 
arrangements, PHA have been classified into homopolymers consisting of one 
monomer, random copolymers of two or more different monomers, and block 
copolymers of at least two homopolymers connected by covalent bond(s) (Fig. 2) 
[9, 10]. The microstructures of PHA and monomer compositions affect the thermal 
and physical properties of PHA, which affects their applications (Table 1) 
[11, 12]. For example, the most studied PHA family member, poly 
(3-hydroxybutyrate) or P3HB, first reported in 1926 [13], is very brittle with high 
crystallinity which limits its applications [14]. In many cases, it is not easy to 
achieve precise control of PHA structure. For example, random copolymers 
consisting of 3-hydroxyhexanoate (3HHx or C6), 3-hydroxyoctanoate (3HO or 
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Fig. 1 Traditional PHA monomers. 3HP 3-hydroxypropionate, 3HB 3-hydroxybutyrate, 3HV 
3-hydroxyvalcrate, 3HHx  3-hydroxyhexanoate, 3HO 3-hydroxyoctanoate, 3HD 
3-hydroxydecanoate, 3HDD 3-hydroxydodecanoate, 3HTD 3-hydroxytetradecanoate 
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Fig. 2. PHA molecular structures [9] 
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Table 1 Physical characterization of PHA and traditional petroleum-based plastic [4, 11] 


Thermal properties Mechanical properties 
PHA Tm CC) T, CC) Om: (MPa) Ep (%) 
P3HP* 78.13 —17.85 21.54 + 1.10 497.6 + 6.2 
P4HB* 61 —47 34.66 + 0.98 1,000 
P3HB* 171.8 3.1 18.0 + 0.7 3.0 + 0.4 
PHBV? 114 —5 26 27 
PHBHH;,* 125 0 7.0 + 0.5 400 + 36 
Polypropylene? 170 - 34 400 
Polystyrene? 110 - 50 - 


P3HP __ poly(3-hydroxypropinoate), P4HB _ poly(4-hydroxybutyrate), P3HB poly 
(3-hydroxybutyrate), PHBV poly(3-hydroxybutyrate-co-20 mol% 3-hydroxyvalcrate), PHBHHx 
poly(3-hydroxybutyrate-co-12 mol% 3-hydroxyhexanoate), T,,, melting temperature, T, glass 
transition temperature, o,,, maximum tension strength, ¢, elongation at break 

“Physical properties of P3HB, P4HB, PHB, PHBHHx [11] 

Physical properties of PHBV, polypropylene, polystyrene [4] 


C8), 3-hydroxydecanoate (3HD or C10), and 3-hydroxydodecanoate (3HDD or 
C12) are always formed when a fatty acid is added to cultures of Pseudomonads 
belonging to the rRNA homology group I, as B-oxidation in Pseudomonas spp. 
always shorten the C12 to C10, C8, and C6 [15]. On the other hand, the in situ fatty 
acid synthesis pathway, although lower in fatty acid synthesis rate for supplying 
PHA monomers than f-oxidation, also supplies various monomers for PHA synthe- 
sis [16], leading to PHA consisting of various monomers in random copolymers. The 
traditional PHA, such as PHB, PHBV, and PHBHHx, produced by wild-type 
microorganisms, are still facing problems of high cost and poor properties, and 
scientists are developing novel methods to lower the cost of PHA or discover novel 
PHA with high value-added applications or better properties using synthetic biology 
and metabolic engineering [17]. In many cases, precursors such as fatty acids, 
alcohols, or functional monomers are expensive, and new pathways are being 
established to synthesize PHA monomers in vivo from low cost glucose 
[16, 18]. This approach is very important if the PHA is to be produced on an 
industrial scale [19]. Recent advances in systems biology have improved the amount 
of information that can be collected, and synthetic biology tools are developing 
modeling and molecular implementation methods, promising to move microbial 
engineering from the iterative approach to a design-oriented paradigm [20]. 


2 Metabolic Pathways of PHA Synthesis 


Many bacteria have been found to produce various polyhydroxyalkanoate (PHA) 
biopolyesters [8]. For example, Ralstonia eutropha was mostly studied in produc- 
ing PHB and PHBV [21] and Pseudomonas putida is well-known for synthesizing 
mcel-PHA [22, 23]. The specificity of a PHA synthase (PhaC) is the most important 
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element determining PHA monomer compositions in different microorganisms 
[24-26]. PhaC from Ralstonia eutropha has been known to be able to polymerize 
PHA monomers consisting of three (C3) to five (C5) carbon chain lengths termed 
short-chain-length PHA or scl PHA [27], including poly(3-hydroxypropionate) 
(P3HP) [28, 29], poly(3-hydroxybutyrate) (PHB) [30], poly(4-hydroxybutyrate) 
(P4HB) [31, 32], poly(@-hydroxyvalerate) (PHV) [33], and copolymers of 
3-hydroxypropionate and 4-hydroxybutyrate (P3HP4HB) [11], as well as similar 
copolymers of P3HB4HB [18], P3HP3HB [34], and PHBV [5, 8]. Many Pseudo- 
monas spp. contain PhaCs that can polymerize monomers of six (C6) to fourteen 
(C14) carbon-chain-length to form medium-chain-length PHA (or mcl PHA) 
[35]. Very few bacteria were found to have PhaCs that can polymerize C4 to C14 
to form scl-mcl copolymers [36, 37]. Wild-type Ralstonia eutropha H16 can only 
produce scl PHA, when introducing PHA synthase gene phaC2p, from Pseudomo- 
nas stutzeri strain 1317 into PHA synthase gene phbCpr, negative mutant 
R. eutropha PHB-4, the recombinant R. eutropha having the ability to synthesize 
mcl PHA. During the cultivation on gluconate, the presence of phaC2p, in 
R. eutropha PHB-4 led to the accumulation of PHB homopolymer at 40.9 wt% in 
dry cells. When using fatty acids as carbon sources, the recombinant successfully 
produced PHA copolyesters containing both scl PHA and mcl PHA of 4-12 carbon 
atoms in length. When cultivated on a mixture of gluconate and a fatty acid, the 
monomer composition of accumulated PHA was strongly affected and the mono- 
mer content was easily regulated by the addition of fatty acids in the cultivation 
medium [36]. A series of optimization strategies were reported on the PHA 
synthase PhaC2p, in E. coli, codon optimization of the gene and mRNA stabiliza- 
tion with a hairpin structure were conducted, and the function of the optimized PHA 
synthase was tested in E. coli. The transcript was more stable after the hairpin 
structure was introduced, both codon optimization and hairpin introduction increas- 
ing the protein expression level compared with the wild-type PhaC2p, The opti- 
mized PhaC2p, increased PHB production by approximately 16-fold to 30% of the 
cell dry weight. When grown on dodecanoate, the recombinant E. coli harboring the 
optimized gene phaC2p,O with a hairpin structure in the 5’ untranslated region was 
able to synthesize fourfold more PHA, consisting of 3HB and mcl 3HA, compared 
to the recombinant harboring the wild-type phaC2p, [38]. 

The authors’ group summarized a metabolic pathways map leading to PHA 
formation (Fig. 3). The most studied PHA synthesis pathways are discussed in 
the following. Pathway I, starting from sugar to scl PHA, especially PHB, glucose 
was used as carbon source to produce acetyl-CoA first, followed by metabolism to 
acetoacetyl-CoA and 3-hydroxybutyryl-CoA, entering the polymerization process 
to form PHB. The recombinant E. coli also showed high productivity of PHA after 
introducing the phaCAB operon from Ralstonia eutropha. Based on this pathway, 
more synthetic pathways were developed to produce more PHA with other struc- 
tures [18, 34]. Pathway II begins from fatty acid(s) as substrate to enter the 
B-oxidation cycle, leading to formation of R-3-hydroxyacyl-CoA monomers for 
mostly mcl PHA synthesis [39]. Pathway III directs acetyl-CoA to malonyl-CoA to 


D.-C. Meng and G.-Q. Chen 


152 


faseUly DULIOsOWOY ‘gIY], ‘| esevuryoyedse ‘Way, :97 ‘aseuasoipAyop [orpourdoid-¢‘] ‘peyq :Z]7 ‘asejoyjUAs Yoo-[Aourdoid ‘gpg :97 ‘aseuesoipAyap 
opAyopre ‘QPpIV :sy ‘eseuasoipAyop opAyoapyeuordoid ‘gnpg :py ‘eseierpAyop [o1oAI8 ‘geyq :¢7 ‘aseyeydsoyd g-¢-JOIVDA[S ‘7qqH ‘eseuasoipAyop g 
-€-[OID94]3 ‘Td :Z/ fasesoysue.n Woo areuordod :“9 yog : 77 <aseuasompAyap ayeyory “WYpy :Q/ fesesaysuey Woo-[Auordoid :pQpgiod 76 ‘aseuasoipAyop GHZ 
‘queg :9 ‘aseuasorpAyop ajepewuAdoidost-¢ ‘gqnoq :7 ‘asesroulos oyepeurfAdoidosi-¢ :g ‘aserowiost ayeyeuAdoidosi-¢ ‘qoney] :¢ ‘aseyyuAs oyepeureyIo “Wut 
ip foseuuds WHd ‘Ovud :¢ fasejonpal Wod-[AjeoRvo}o908R JUapuadap AWN ‘qed :Z ‘asvpompojoy-g “yeu :/ “[6] sisomuds WH J0j skemuyed orpoqeeyl ¢ ‘SI 


VHd TOW VHd TOW-"198 VHd 19s VHd TOW-?198 VHd TOW 
t St et 
a 
y SY 
/ \ 
( 
VOD MIEAXOIPKH- EY VOD-ANINGAXOAPAH-f VOD IEAM XOIPAH-E Yo D-|AaAngdxoapay-g VoI-1eT yod-uoidosdsxoapsy-¢ —yo-|hoekxoupay-e-y 
+ * an 
% \ = 2 \ Vop-+Aadngsxoupsy-z ih ad oy) 4 
\ VOD MaaCAojay-g v { 
ad Dek, «\ VO)-poropoy \ 6 pre ourdosddxoapspy-¢ or! S 
— 9v-}oukxoupspe¢- ee > gan P - ' 
dV PAH-E-Y oo? awacinqsxoapsH-f —yop-|huordoag apeadgng sxospry-Z an H 
* C 
e t 1 ; t 8 ‘1 * eee 
= : apAyaprepiuas sung aveasingowy-z EET \ VeoivexoupsH-es +— 
dOv-oug sysoqpuds mys uy dOV-H2 80 M-E e t 2 } | t apAyapypeuoydosddxoapay-g pal = 
i ty w 
Vo p-1huypong AEWA / ‘ 
auyuoosy ' . ape uopepprog | 
“ , | ° Vop-HMoLropy-g + ppe Gry pauaay $ yoo-Mouyq 
re dOV-OV / \ «} ayeuoseayt> a , / 
ae e a sf 
\ =a one wale “x - ) 
IpkD VE WVPIHOEXO = pateay i} 1o-e re yop-ioy te 
; : = , a 10499419) ' : 
ADV-Thuopeyy J ayeanagg Ss . ! apdo dopepinxg-/ 


t 


ee ne pe Teme 4 
Voo-uopew aeysoyd-¢ opAyapes92449 + ouoyooukxoupcqiqg / 
t * yoo>-iaey 

ies = 


ee aeydsoydiq-9'] asojonay ao" 


i 


asoonts) asoonys) pre dney 


Synthetic Biology of Polyhydroxyalkanoates (PHA) 153 


3-ketoacyl-ACP for forming R-3-hydroxyacyl-CoA monomers [40, 41]. Glucose 
was also used as carbon source to produce novel PHA with high value-added 
products, such as P3HP, which is discussed later [34]. The types of PHA formed 
depend not only on monomer supply pathways, but also on specificity of PHA 
synthases. Generally, a low specificity of a PhaC allows formation of diverse PHA 
structures [36]. As the properties of copolymer of scl-PHA and mcl-PHA are 
drawing more attention, a lot of work is focusing on the production of scl-co-mcl 
PHA using a low specificity of a PhaC [25, 42, 43]. 


3 Diversity of PHA 


Diversity of PHA has been focused not only on monomer variations but also on 
the composition of PHA, especially on PHA main chain structures (Table 2). PHA 
was first discovered in the form of poly-3-hydroxybutyrate (PHB) in the last 
century [13]. New monomers 3-hydroxyvalerate (3HV) and 3-hydroxyhexanoate 
(3HHx) were detected as components of PHA in bacteria in activated sewage 
sludge in the 1970s [44]. Then, 10-15 years afterward, Pseudomonas oleovorans 
was found to be able to produce a series of PHA containing 3-hydroxyhexanoate 
(3HHx), 3-hydroxyoctanoate (3HO), 3-hydroxydecanoate (3HD), and 
3-hydroxydodecanoate (3HDD) when grown on different alkanes or fatty acids as 
substrates [16, 33]. In the following years, scientists started to modify the PHA 
pathways or introduce the PHA pathways into a better host. For example, the 
phaCAB operon for PHB production was transformed into FE. coli, and the 
non-PHA producing bacteria also showed high PHB productivity with the heterol- 
ogous PHB pathway [30]. An increasing number of novel PHA were synthesized 
using mostly structure-related substrates and, in 1995, 91 different hydroxyalkanoic 
acids were reported as monomers in PHA [8]. PHA diversity was further increased 
by producing functional PHA, grafted with other chemicals and polymers [45— 
47]. From then on, diversity of PHA was further expanded to include PHA polymer 


< 


Fig. 3 (continued) ThrAC, threonine synthase; /9: IlvA, threonine deaminase; 20: PhaA, 
B-ketothiolase; 2/: PhaB, NADP-dependent acetoacetyl-CoA reductase; 22: SucD: succinic 
semialdehyde dehydrogenase; 23: 4hbD, 4-hydroxybutyrate dehydrogenase; 24: OrfZ, 
4-hydroxybutyrate-CoA transferase; 25: PhaClp.o.19, PHA synthase from Pseudomonas 
sp. MBEL 6-19; 26: fadB, S-3-hydroxyacyl-CoA dehydrogenase; 27: fadA, 3-ketothiolase; 28: 
PhaJ, enoyl-CoA hydratase; 29: epimerase; 30: YqeF/FadA, thiolase; 3/: FadB, hydroxyacyl-CoA 
dehydrogenase/enoyl-CoA hydratase; 32: YdiO, enoyl-CoA reductase, Ter, trans-2-enoyl-CoA 
reductase from Treponema denticola; 33: PhaC, type Il PHA synthase; 34: B-ketoacyl-ACP 
synthase; 35: B-ketoacyl-ACP reductase; 36: B-hydroxyacyl-ACP dehydrase; 37: enoyl-ACP 
reductase; 38: PhaG, 3-hydroxyacyl-acyl carrier protein-coenzyme A transferase; 39: PhaCl 
(STQK), PHA synthase derived from Pseudomonas sp. 61-3 PHA synthase; 40: engineered 
PhaC1p,6-19, PHA synthase 
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Table 2 Diversity of PHA [9] 


Types Polymer structures 


Homopolymers | PHB, P3HP, P4HB, PHV, PTE, PLA, P3HHx, P3HHp, P3HO, P3HD, 
P3HDD, P3HPhV, P3HPE, PHU, P3H6PHx 

Random P(3HB-co-3HV), P(3HB-co-4HB), P(3HB-co-3HHx), P(3HP-co-4HB), 
copolymers P(3HB-co-3HP), P(3HB-co-mcl 3HA), P(3HHx-co-3HO-3HD-3HDD), 
P(3HB-co-LA) 

Block P3HB-b-P3HBV, P3HB-b-4HB, P3HP-b-4HB, P3HB-b-3HHx, P3HB-b-3HP, 

copolymers P3HHx-b-P(3HD3HDD) 

Graft polymers | PS-g-PHA, PMMA-g-PHA, PHA-g-PAA, PHA-g-AA-CS, PHA-g-AA-COS, 
PHA-g-Cellulose, PEG-g-PHA, PEGMA-g-PHO, PLA-g-PHA, VI-g-PHO, 
GDD-g-PHO, PHOU-g-Jeffamine, PHOU-g-POSS, PHBV-g-PVK, PHBV- 
g-PA 


3HB  3-hydroxybutyrate, 3HP 3-hydroxypropionate, 4HB  4-hydroxybutyrate, 3HV 
3-hydroxyvalcrate, PTE polythioester, PLA polylactic acid, 3HHx 3-hydroxyhexanoate, 3HHp 
3-hydroxyheptanoate, 3HO 3-hydroxyoctanoate, 3HD 3-hydroxydecanoate, 3HDD 
3-hydroxydodecanoate, 3HPhV 3-hydroxy-5-phenylvalerate, 3HPE 3-hydroxy-4-pentenoic acid, 
PHU polyhydroxyundecenoate, 3H6PHx 3-hydroxy-6-phenylhexanoate, PS-g-PHA poly(styrene 
peroxide)-g-PHA, PMMA-g-PHA poly(methy] methacrylate peroxide)-g-PHA, PHA-g-PAA PHA- 
g-poly(acrylic acid), PHB-g-AA/starch acrylic acid grafted poly(3-hydroxybutyric acid)/starch, 
PHA-g-AA-CS PHA-g-AA-chitosan, PHA-g-AA-COS PHA-g-AA-chitooligosaccharide, PEG-g- 
PHA poly(ethylene glycol)-g-PHA, PEGMA-g-PHO monoacrylate-poly(ethylene glycol)-g-PHO, 
PLA-g-PHA poly(lactic acid)-g-PHA, VI-g-PHO vinylimidazole-grafted poly 
(3-hydroxyoctanoate), GDD-g-PHO glycerol 1,3-diglycerol diacrylate-g-PHO, PHOU-g- 
Jeffamine PHOU-g-a-amino-w-—methoxy poly(oxyethylene-co-oxypropylene), (Jeffamine®)-g- 
PHOU, PHOU-g-POSS PHOU-g-polyhedral oligomeric silsesquioxane, PHBV-g-PVK PHBV-g- 
poly(phenyl vinyl ketone), PHBV-g-PA PHBV-g-poly(acrylamide) 


chains with various microstructures, such as homopolymers, random copolymers, 
block copolymers, block-random copolymers, functional polymers, graft polymers, 
and thiopolyesters, as well as their various combinations [10, 11, 45]. Among the 
diverse PHA, grafted PHA polymers can be most easily extended to a wider 
diversity, and this is a topic that requires further elucidation [46, 48]. However, 
only very few PHA are commercially available for application developments, 
including PHB, PHBV, P3HB4HB, and PHBHH«x. All other PHA have been 
prepared by individual laboratories across the world in very small amounts out of 
academic curiosity. How to accelerate the pace of discovery and deployment of 
advanced PHA materials has been a central question for all PHA researchers and 
stakeholders. All these depend on the availability of the diverse PHA in sufficient 
quantities for studies of their thermal and mechanical properties, as well as other 
application potentials. It should be a global effort to establish platforms to supply 
diverse PHA in sufficient quantities for various developments. 
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3.1 Homopolymers 


So far, only limited homopolymers have been reported, including scl PHA: PHB 
[49], P3HP [34], P4HB [32], microbial polylactic acid (PLA) [42], and PHV [33], 
mcl PHA: P3HHx, P3HHp (poly3-hydroxyheptanoate) [50], PHO, P3HD, P3HDD, 
and P3HTD or poly(3-hydroxytetradecanoate) [35, 51], as well as functional PHA: 
poly(3-hydroxy-5-phenylvalerate) or P(3HPhV) [47], poly(3-hydroxy-4- 
pentenoate) [46], poly(3-hydroxy-10-undecenoate) [52], and poly(3-hydroxy-6- 
phenylhexanoate) [48]. With the success of engineering the B-oxidation pathway, 
more and more homopolymers can be synthesized. Pseudomonas putida KT2442 
often produces mcl PHA consisting of 3HHx, 3HO, 3HD, 3HDD, and 3HTD, and 
when it was knocked out with its B-oxidation related genes fadA, fadB, fadB2x, 
fadAx, and phaG, the mutant P. putida KTQQ20 synthesized homopolymer poly-3- 
hydroxydecanoate (PHD) when grown on decanoic acid [35]. Mcl PHA producer 
Pseudomonas entomophila L48 was also studied for homopolymer production, 
when genes encoding 3-hydroxyacyl-CoA dehydrogenase, 3-ketoacyl-CoA 
thiolase, and acetyl-CoA acetyltransferase in the B-oxidation pathway were 
knocked out. The mutant P. entomophila LAC26 accumulated over 90 wt% PHA 
consisting of 99 mol% 3HDD using dodecanoic acid as a carbon source. The 
B-oxidation-inhibited mutant of P. entomophila was also studied to produce ben- 
zene containing PHA, poly(3-hydroxy-5-phenylvalerate) using 5-phenylvaleric 
acid as carbon source and homopolymer P(3-hydroxy-9-decenoate) using 
9-decenol as carbon source [45, 47]. Synthetic biology also makes it possible to 
create novel PHA with designed structures and compositions. 


3.2. Random Copolymers 


Most of the commercially produced PHA are random copolymers, including P 
(3HB-co-3HV) or PHBV, P(3HB-co-4HB) or P3HB4HB, and P(3HB-co-3HHx) 
or PHBHH«x, which have been produced on an industrial scale [2]. Copolymers of 
mcl PHA termed P(3HHx-co-3HO-co-3HD-co-3HDD) are commonly synthesized 
by many Pseudomonads belonging to the rRNA homology group I [39], but it is too 
soft for any application [53]. Recently, random copolymers of P(3HP-co-4HB) 
[11], P3HB-co-3HP) [34], poly(3HB-co-3MP) [54], and P(3HB-co-LA) [42] were 
found to be accumulated by recombinant E. coli, and these copolymers demon- 
strated improved properties over the existing ones. However, the yield of PHA 
production needs to be improved for further industrial scale production. 
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3.3. Block Copolymers 


Pederson et al. [10] reported the first PHA block copolymer of PHB-b-PHBV, and 
the material was found to have anti-ageing property. Block copolymerization is a 
method of controlling the thermodynamic nature of a polymer, and it is able to 
withstand the ageing effect that leads to the brittleness of a polymer material 
[10]. Starting in 2011, the authors’ lab and other groups have succeeded in making 
a series of diblock copolymers, including PHB-b-P3HVHHp [55], PHB-b-P4HB 
[56], PHB-b-PHHx [57], P3HB-b-P3HP [58], P3HP-b-P4HB [59], and P3HHx-b-P 
(3HD-co-3HDD) [60]. The sequential feeding of two or more structurally related 
carbon substrates led to biosynthesis of block copolymers. For example, by first 
feeding 1,3-propanediol and late addition of 1,4-butanediol to cultures, the 
engineered E. coli synthesized block copolymers of P3HP-b-P4HB [59]. All the 
diblock copolymers were found to have one or more improved properties over their 
two relative homopolymers, random copolymers or blend polymers. Compositions 
of diblock copolymers can be adjusted based on monomer substrate ratios in the 
feeds, leading to adjustable polymer properties. Although multiple-block PHA are 
still difficult to synthesize, with the development of synthetic biology it should 
become possible to realize the accurate control of monomer composition and then 
production of block PHA with diverse structures on a larger scale. 


3.4 Graft Polymers 


As it is possible to introduce functional groups into PHA chains, such as double or 
triple bonds, epoxy, carbonyl, cyano, phenyl, and halogen [46], graft PHA polymers 
can be formed by inserting small molecules or larger polymers into the PHA side 
chains, leading to dramatic changes PHA properties. So far, successful PHA graft 
polymers include poly(styrene peroxide)-g-PHA or PS-g-PHA [61], poly(methyl 
methacrylate peroxide)-g-PHA or PMMA-g-PHA [62], PHA-g-poly(acrylic acid) 
or PHA-g-PAA [63], PHA-g-AA-chitooligosaccharide or PHA-g-AA-COS [64], 
PHA-g-Cellulose [65], poly(ethylene glycol)-g-PHA or PEG-g-PHA [66], 
monoacrylate-poly(ethylene glycol)-g-PHO or PEGMA-g-PHO [67], poly(lactic 
acid)-g-PHA or PLA-g-PHA [68], vinylimidazole-g-PHO or VI-g-PHO [69], glyc- 
erol-1,3-diglycerol diacrylate-g-PHO or GDD-g-PHO [70], (Jeffamine”)-g-PHOU 
or PHOU-g-Jeffamine, PHOU-g-a-amino-@-methoxy poly(oxyethylene-co- 
oxypropylene) [71], PHOU-g-polyhedral oligomeric silsesquioxane or PHOU-g- 
POSS [72], PHBV-g-poly(pheny! vinyl ketone) or PHBV-g-PVK [73], and PHBV- 
g-poly(acrylamide) or PHBV-g-PA [74]. Graft copolymers were mostly synthe- 
sized by chemical modification. For example, side carboxylic groups of the PHA 
were coupled with end hydroxyl groups of methoxy-poly(ethylene glycol) 
(MePEG) or methoxy-poly(lactic acid) (MePLA) in the presence of N,N- 
'-dicylohexylcarbodiimide (DCC) [68]. There are endless possibilities to create 
new graft PHA homo- or copolymers. 
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4 Engineering Pathways for Controlling PHA Biosynthesis 


4.1 Pathways for scl PHA 


Microbial metabolic engineering has been exploited as a powerful approach for 
enhanced production of novel polyesters. A designed pathway assembled using a 
synthetic biology approach could also precisely control the PHA composition. The 
use of recombinant FE. coli enabled an efficient production of poly 
(4-hydroxybutyrate) or P4HB using glucose as a sole carbon source when a 
pathway was established containing genes encoding succinic semialdehyde dehy- 
drogenase of Clostridium kluyveri and PHB synthase of Ralstonia eutropha com- 
bined with inactivation of native succinate semialdehyde dehydrogenase genes sad 
and gabD to enhance the carbon flux toward P4HB biosynthesis [32]. When the 
PHB accumulation pathway of Ralstonia eutropha was co-expressed with the 
P4HB synthesis pathway, the recombinant E£. coli produces P(3HB-co-4HB) from 
glucose [18]. 

Aeromonas hydrophila 4AK4 normally produces copolyesters PHBHHx. 
Recombinant A. hydrophila 4AK4 expressing vgb and fadD genes encoding 
Vitreoscilla hemoglobin and E. coli acyl-CoA synthase, respectively, was found 
to produce homopolymer poly(3-hydroxyvalerate) (PHV) (C5) using undecanoic 
acid as a solo carbon source [75]. At the same time, 3-hydroxyvalerate monomer 
can also be supplied via the threonine degradation pathway. Recently, it became 
possible to produce PHA containing 2-hydroxybutyrate [76] or lactate [42]. In 
addition, P3HP can be produced from 1,3-propandiol [29], glycerol alone [77], 
and glucose as sole carbon source [34]. 

PHA synthesis genes phbC and orfZ cloned from Ralstonia eutropha H16 and 
Clostridium kluyveri, respectively, were transformed into a B-oxidation weakened 
Pseudomonas putida KTOYO8AGC, a mutant of P. putida KT2442, and the 
resulting mutant termed KTHH06 was able to produce P3HB-b-P4HB diblock 
copolymer [56]. 


4.2 Synthesis of Poly(3-hydroxypropionate-co-4- 
hydroxybutyrate) with Fully Controllable Structures by 
Recombinant Escherichia coli Containing 
an Engineered Pathway 


Recently, microbial copolyesters containing 3HP have become increasingly inter- 
esting because of the ultrahigh strength brought about by 3HP, and these include P 
(3HB-co-3HP), P(3HP-co-3HB-co-3HH-co-3HO), P(4HB-co-3HP-co-Lactate), P 
(4HB-co-3HP-co-2HP), P(3HB-co-3HP-4HB-co-Lactate), and P(3HB-co-3HP-co- 
4HB-co-2HP) [78]. Natural bacteria are unable to produce 3-hydroxyproionate 
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(3HP) and 4-hydroxybutyrate (4HB) as building blocks for PHA synthase to make 
the unnatural biopolyester P(3HP-co-4HB) [11]. However, precursors of 3HP and 
4HB can come from 1,3-propanediol (PDO) [29] and 1,4-butanediol (BDO) [79], 
respectively. Copolyesters of 3-hydroxypropionate (3HP) and 4-hydroxybutyrate 
(4HB), abbreviated as P(3HP-co-4HB), were synthesized by E. coli harboring a 
synthetic pathway consisting of five heterologous genes including orfZ encoding 
4-hydroxybutyrate-coenzyme A transferase from Clostridium kluyveri [80, 81], 
pcs’ encoding the ACS domain of tri-functional propionyl-CoA ligase (PCS) 
from Chloroflexus aurantiacus [82], dhaT and aldD encoding dehydratase and 
aldehyde dehydrogenase from Pseudomonas putida KT2442 [83], and phaCl 
encoding PHA synthase from Ralstonia eutropha (Fig. 4) [11, 29]. When grown 
on mixtures of 1,3-propanediol (PDO) and 1,4-butanediol (BDO), compositions of 
4HB in microbial P(3HP-co-4HB) were controllable ranging from 12 mol% to 
82 mol% depending on PDO:BDO ratios. Their mechanical and thermal properties 
showed obvious changes depending on the monomer ratios (Table 3). Morpholog- 
ically, P(3HP-co-4HB) films only became fully transparent when monomer 4HB 
content was around 67 mol% (Fig. 5) [11]. 

Several key enzymes were considered as important for making P(3HP-co-4HB) 
copolymers with flexible 4HB content: propionyl-CoA synthetase (PCS’) from the 
3-hydroxypropionate cycle of phototrophic green non-sulfur eubacterium 
Chloroflexus aurantiacus is very likely to convert 3HP to 3HP-CoA, and 
4HB-coenzyme, a transferase gene orfz from Clostridium kluyveri, was found to 
turn 4HB into 4HB-CoA effectively [82]. Genes dhaT and aldD were found to turn 
1,4-butanediol (BDO) or/and 1,3-propanediol (PDO) into 4HB or/and 3HP, respec- 
tively. The enzyme encoded by dhaT was mostly active with substrates containing 
two primary alcohol groups separated by one or two carbon atoms such as 
1,3-propanediol or 1,4-butanediol, and 3HP or/and 4HB yield were affected by 
expression levels of these two genes [29, 79, 83]. Promoter of PHA synthesis genes 
phaCAB operon from Ralstonia eutropha (PR-) was demonstrated to be more active 
than /ac promoter or T7 promoter transcriptionally in E. coli. Finally, PHA synthase 
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Fig. 4 Construction of P(3HP-co-4HB) biosynthetic pathways in recombinant Escherichia coli 
[11]. Enzymes for each numbered step are as follows: (/) 1,3-propanediol dehydrogenase; (2) 
aldehyde dehydrogenase; (3) propanoyl-CoA synthetase; (4) 4-hydroxybutyrate coenzyme A 
transferase; (5) PHA synthase 
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Fig. 5 Transparency of P(3HP-co-4HB) consisting of different monomer compositions 
[11]. From /eft to right: P(3HP), P(3HP-co-12 mol% 4HB), P(3HP-co-25 mol% 4HB), P 
(3HP-co-38 mol% 4HB), P(3HP-co-67 mol% 4HB), P(3HP-co-82 mol% 4HB), P(4HB) 


PhaC1 of R. eutropha has sufficient activity for polymerizing SCL PHA monomers 
[40,29]. 

A mixture of PDO and BDO in cultures of the recombinant E. coli $17-1 resulted 
in formation of copolyesters P(3HP-co-4HB) consisting of 3HP and 4HB. Compo- 
sitions of the 3HP and 4HB in P(3HP-co-4HB) could be adjusted by changing the 
ratios of PDO to BDO. For example, 63 wt% P(17 mol% 3HP-co-83 mol% 4HB) 
was accumulated when the PDO:BDO ratio was 1/10; whereas a ratio of 1:1 led to 
the formation of P(70 mol% 3HP-co-30 mol% 4HB). When PDO/BDO was equal 
to 10/15 (or 2/3), only 2.3 wt% P(88 mol% 3HP-co-12 mol% 4HB) was synthe- 
sized, indicating the toxicity of high BDO or PDO concentration. Especially when 
the total concentration of BDO and PDO were over 20 g/L, the toxicity became very 
obvious, as indicated by significant reduction on CDW and PHA production. 
Obviously, a copolymer consisting of a defined 3HP:4HB ratio can be produced 
by adjusting the ratios of PDO:BDO. In this study, P(3HP-co-4HB) consisting of 
17 mol% 3HP-88 mol% 3HP were obtained. Interestingly, the transparency of P 
(3HP-co-4HB) was also found to be dependent on monomer compositions. Only P 
(3HP-co-67 mol% 4HB) was a totally transparent material, whereas other PHA 
including P(3HP), P(3HP-co-12 mol% 4HB), P(3HP-co-25 mol% 4HB), P(3HP-co- 
38 mol% 4HB), P(3HP-co-82 mol% 4HB), and P(4HB) were observed to be less 
transparent [11]. 

The addition of 4HB monomer into P3HP led to the formation of P(3HP-co- 
4HB) which clearly lowered the P3HP melting temperatures (T,,,) and the glass 
transition temperature (T,) from 78°C and —18°C to 61—65°C and —24°C to 41°C 
with the 4HB ratio increased from 12 mol% to 67 mol% (Table 3). Interestingly, P 
(3HP-co-82 mol% 4HB) was revealed to have a much lower T,,, of 36°C and a 
higher T, of —29°C compared to other copolymers. T,, seemed to stabilize at 
around 63°C in copolymers consisting of 12-67 mol% 4HB. T, decreased from 
—24°C to —42°C with 4HB content increasing from 12 mol% to 67 mol%. 
Homopolymer P4HB had the lowest T, of —47°C with a T,, of 61°C (Table 3). 

Copolymerization reduced yield strengths and Young’s modulus of both P3HP 
and P4HB (Table 3). However, the elongation at breaks showed an improvement 
for P(3HP-co-4HB) consisting of 12-38 mol% 4HB over P3HP and P4HB. On the 
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other hand, only P(3HP-co-12 mol% 4HB) had an increase on maximum tension 
strength over other homo- and copolymers. In terms of thermal and mechanical 
properties, P(3HP-co-4HB) seems to be unique in combined properties compared 
with commercial PHA such as PHB, PHBV, and PHBHHx. 

As PDO and BDO can be respectively biosynthesized from glucose [84, 85], it 
becomes possible to establish an engineering pathway for production of P(3HP-co- 
4HB). Block copolymers of P3HB-b-P3HP could also be produced [59]. The two 
pathways supplied 3HP and 4HB monomers independently, leading to the forma- 
tion of homopolymer P3HP in the absence of 4HB, of P4HB in the absence of 3HP, 
or to random copolymers of P(3HP-co-4HB) when 3HP and 4HB were both 
available. 


4.3. Poly(3-hydroxybutyrate-co-3-hydroxypropionate) from 
Glucose by Engineering Escherichia coli 


Poly(3-hydroxypropionate) (P3HP), an scl-PHA containing three carbon atoms 
without side chain, shows the best combined mechanical properties, including an 
elongation at break of more than 600%, and a Young’s modulus of 3 GPa 
[11]. P3HP therefore stands out as a PHA member that holds great promise. No 
microorganism has been known to synthesize homopolymer P3HP so far. Thus, 
recombinants have been developed to produce P3HP. Andreessen et al. [28] first 
reported bacterial synthesis of P3HP using glycerol as carbon source in a two-step 
fed-batch fermentation. Wang et al. [77] modified the process by replacing the strict 
anaerobic glycerol dehydratase from Clostridium butyricum with the vitamin 
B12-dependent glycerol dehydratase DhaB123 from Klebsiella pneumonia. Zhou 
et al. [29] used 1,3-propanediol as a precursor to produce over 90% P3HP in E. coli 
cell dry weight (CDW). There were attempts to synthesize P3HP from an unrelated 
carbon source starting with acetyl-CoA [86, 87]. The related pathway involves 
carboxylation of acetyl-CoA to malonyl-CoA, reduction of malonyl-CoA to 3HP, 
its coupling to CoA, and their following polymerization. This recombinant pathway 
led to only 1.32 g/L CDW containing 0.98% P3HP [87]. 

The authors’ lab reported that multiple genes from various sources were assem- 
bled into a new pathway for the production of P3HP from glucose as a sole carbon 
source, including gpd/ (glycerol-3-P dehydrogenase) and gpp2 (glycerol-3-P phos- 
phatase) from Saccharomyces cerevisiae [88, 89], dhaB1-3 (glycerol dehydratase) 
and gdrAB (glycerol dehydratase reactivating factor) from Klebsiella pneumonia 
[90, 91], pduP (propionaldehyde dehydrogenase) from Salmonella typhimurium 
[92, 93], phaC (PHA synthase) from Ralstonia eutropha [26, 94], aldD (aldehyde 
dehydrogenase) and dhaT (1,3-propanediol dehydrogenase) from Pseudomonas 
putida KT2442 [79, 83], and pcs’ (propanoyl-CoA synthetase) from Chloroflexus 
aurantiacus [29]. When the plasmid containing the above multiple genes was 
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Fig. 6 Construction of P3HP and P(3HB-co-3HP) biosynthetic pathways from glucose as a sole 
carbon source in recombinant Escherichia coli [34]. Enzymes encoded by each gene are described 
below: gpd/ glycerol-3-P dehydrogenase (Saccharomyces cerevisiae), gpp2 glycerol-3-P phos- 
phatase (Saccharomyces cerevisiae), dhaBl-3 glycerol dehydratase (Klebsiella pneumoniae), 
gdrAB glycerol dehydratase reactivating factors (Klebsiella pneumoniae), pduP propionaldehyde 
dehydrogenase (Salmonella typhimurium), phaC polyhydroxyalkanoate synthase (Ralstonia 
eutropha), aldD aldehyde dehydrogenase (Pseudomonas putida), dhaT 1,3-propanediol dehydro- 
genase (Pseudomonas putida), pcs’ propanoyl-CoA synthetase (Chloroflexus aurantiacus), phaA 
B-ketothiolase (Ralstonia eutropha), phaB NADPH-dependent acetoacetyl-CoA reductase 
(Ralstonia eutropha) 


transformed into E. coli, up to 18.4% P3HP homopolymer was produced from 
glucose (Fig. 6) [34]. 

The expression of the two genes gpd/ and gpp2 allows dihydroxyacetone from 
glucose glycolysis to form glycerol-3-phosphate, which is further hydrolyzed to 
generate glycerol [95]. Glycerol is converted to 3-hydroxypropionaldehyde by 
glycerol dehydratase (DhaB 1-3) from Klebsiella pneumonia, which is an important 
intermediate for P3HP, and 3-hydroxypropionaldehyde is converted to 
3-hydroxypropionate (3HP) by aldehyde dehydrogenase (AldD) cloned from Pseu- 
domonas putida KT2442. Propionyl-CoA synthetase (PCS’) from Chloroflexus 
aurantiacus should be able to change 3HP to 3HP-CoA. At the same time, 
3-hydroxypropionaldehyde can also be directly turned into 3HP-CoA by 
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propionaldehyde dehydrogenase (PduP) from Salmonella typhimurium. To increase 
the activity of glycerol dehydratase, gdrAB, a reactivation factor for glycerol 
dehydratase was inserted into the above-mentioned pathway. When gene pudP 
was used to replace aldD and dhaT, the resulting plasmid pDC02 became the 
only plasmid containing the entire pathway from glucose to P3HP. Recombinant 
E. coli Trans1-T1 (pDCO02) produced over 18% P3HP in over 5 g/L CDW when 
grown in glucose LB medium whereas in the glucose mineral medium, 12% P3HP 
was accumulated in 3 g/L CDW. More P3HP accumulation from glucose is 
expected when the metabolic flux is further optimized [34]. 

When a P3HB synthesis pathway containing the P3HB synthesis operon 
phaCAB from Ralstonia eutropha was added to the P3HP synthesis pathway, the 
recombinant harboring the P3HB and P3HP pathways started to produce random 
copolymers of P3HB3HP from glucose as the sole carbon source. This study 
demonstrated that ultra-strong polyhydroxyalkanoates (PHA), mainly P3HP and 
P3HB3HP, can be synthesized from low cost glucose using synthetic biology 
approaches. 

The two plasmids pl5apCAB and pDC02, which harbor three genes and nine 
genes from different microorganisms responsible for P3HP and P3HB syntheses 
from glucose, respectively, can be regarded as bio-devices or bio-bricks that are 
assembled to perform their functions (Fig. 7). This study can serve as a typical 
synthetic biology example that uses bio-bricks or bio-devices to achieve biological 
functions. In this case, it was the synthetic biology for production of novel 
bio-polyesters. In total, 11 heterogeneous genes were cloned from other microor- 
ganisms and were assembled to become new pathways to meet our new demands. 

In the future, the two polyester synthesis pathways could be transformed into 
other microbial hosts after codon optimization to enhance P3HB3HP production by 
some industrial microbial hosts [34]. 


phaC dhaB2 gpp2\ gpdi 


Pre 


Fig. 7 Orders of gene arrangements on plasmids pDC02 and p15apCAB, respectively [34] 
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4.4 Engineering the p-Oxidation Pathway 
on the Chromosome for mcl PHA Synthesis 


Many Pseudomonas spp. are able to utilize fatty acids via their B-oxidation to obtain 
both energy and substrates for cell growth. The B-oxidation pathway shortens the 
fatty acid chain lengths in each cycle by two carbon atoms, generating several PHA 
monomers of different lengths, which can result in the formation of random PHA 
copolymers (Fig. 8). 

Recently, the authors’ lab succeeded in engineering the B-oxidation pathway 
encoded on the chromosomes of Pseudomonas putida and Pseudomonas 
entomophiles, resulting in controllable PHA composition, including formation of 
PHA homopolymers and composition-adjustable random copolymers and block 
copolymers [35, 51, 60]. To avoid the changing of fatty acid substrate structures, 
chromosomal genes related to B-oxidation were selectively deleted to weaken 
B-oxidation in Pseudomonas spp., so that fatty acids can maintain their structures 
when used as PHA monomer precursors. 

Mutant Pseudomonas putida KTQQ20, a derivative of P. putida KT2442, 
deleted key fatty acid degradation enzymes encoded by genes fadB, fadA, fadB2x, 


Acetyl-CoA Fatty acid In situ fatty acid 
> Fatty acid synthesis 
FadD 

’ TesA, TesB, YciA ’ 

Acetyl-CoA + R-3-Hydroxyacyl-ACP 
. Acyl-CoA | PhaG 

’ 

FadE R-3-Hydroxyacyl-CoA 


FadA | 


| Pha 
+ FadA, YqeF Ydio, Ter 4 PhaC 


The reversed fatty acid 4 


3-Ketoacyl-CoA 1 Suiialan tee Enoy-CoA PHA 
N FadB FadB 
FadB 4 =" FadBa FadJ PhaC 
S-3-Hydroxyacyl-CoA + > R-3-Hydroxyacyl-CoA 


B-Oxidation cycle 


Fig. 8 The weakened beta-oxidation cycle, reversed fatty acid beta-oxidation cycle and in situ 
fatty acid synthesis. Enzymes in B-oxidation cycle: FadD fatty acid-CoA ligase, FadE acyl-CoA 
dehydrogenase, FadBa S-enoyl-CoA hydratase, FadB 3-hydroxyacyl-CoA dehydrogenase, FadA 
acetyl-CoA acetyltransferase, PhaJ R-enoyl-CoA hydratase, PhaC PHA synthase, PhaG 
3-hydroxyacyl-CoA-acyl carrier protein transferase. Genes in the reversed fatty acid B-oxidation 
cycle: ygeF/fadA thiolase, fadB hydroxyacyl-CoA dehydrogenase/enoyl-CoA hydratase, ydiO 
enoyl-CoA reductase, ter trans-2-enoyl-CoA reductase, tesA/tesB/yciA thioesterase 
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and fadAx, as well as PP2047 and PP2048 encoding 3-hydroxyacyl-CoA dehydro- 
genase and acyl-CoA dehydrogenase, respectively, combined with the deletion of 
phaG encoding 3-hydroxyacyl-CoA-acyl carrier protein transferase, becomes 
defective in fatty acid B-oxidation activity. The strain was now able to synthesize 
homopolymer poly(3-hydroxydecanoate) or PHD and P(3HD-co-84 mol% 3HDD) 
when grown on decanoic acid or dodecanoic acid, respectively [35]. When grown 
on mixtures of the sodium salt of hexanoate (C6) and decanoate (C10), it produced 
random copolymers of P(3HHx-co-3HD) with monomer compositions easily reg- 
ulated by varying the C6:C10 ratio. P. putida KTQQ20 also produced diblock 
copolymer P3HHx-b-P(3HD-co-3HDD) when sodium salts of hexanoate (C6) and 
decanoate (C10) were fed to its culture one after another [60]. 

Pseudomonas entomophila strain L48, a strong fatty acid utilizer, was also 
investigated for microbial production of mcl PHA. A total of 70.2% of 
P. entomophila genes have orthologs with the P. putida genome, of which >96% 
are found in synteny. The B-oxidation activity of P. entomophila was weakened by 
deleting similar genes on its chromosome as in P. putida. The resulting 
P. entomophila LAC26 accumulated over 90 wt% PHA consisting of 99 mol% 
3HDD. Homopolymers of C6—C14 were all accumulated, respectively, when an 
equal chain length of a fatty acid was fed to the mutant for related PHA homopol- 
ymer production [51]. 


4.5 Pathways for scl and mcl PHA Copolymers 


P. putida KTOYO6 is a fatty acid B-oxidation impaired mutant in which genes of 
3-ketoacyl-CoA thiolase (fadA) and 3-hydroxyacyl-CoA dehydrogenase (fadB) 
were deleted to a maximum level to improve fatty acid utilization for PHA 
synthesis [53]. When its mcl PHA synthase (C6—C14) was replaced by a less 
specific synthase operon phaPCJ,4- which could synthesize both scl and mcl mono- 
mers (C3—C7) from Aeromonas caviae, recombinant P. putida KTOYO6AC 
(phaPCJ 4.) was able to produce a diblock copolymer of PHB-b-PHVHHp by 
controlling the sequential feeding time of sodium butyrate and sodium heptanoate. 
When cultivated on mixtures of sodium salts of butyrate (C4) and hexanoate (C6), 
random copolymers of P(3HB-co-3HHx) were accumulated with monomer con- 
tents adjustable by C4:C6 ratios [55]. 


5 Functional PHA 


When cultures of engineered strains, such as P. putida KTQQ20 or P. entomophila 
LAC23, were fed with fatty acids containing functional groups such as double or 
triple bonds, epoxy, carbonyl, cyano, phenyl] and halogen group, respectively [46], 
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the resulting PHA contains the functional groups on the side chains, allowing 
further chemical modifications (grafting) on the side chains. 

Homopolymers with 100 mol% content of aromatic moieties, random copoly- 
mers, or a blend of both have been produced [47]. Hydrophilic PHA bearing alkoxy, 
acetoxy, or hydroxyl groups are also of great interest, as they show enhanced 
solubility and biocompatibility [46]. 

The B-oxidation weakened P. entomophila LAC23 was found able to accumulate 
PHA containing phenyl groups on the side chains. When cultured in 
5-phenylvaleric acid, only homopolymer poly(3-hydroxy-5-phenylvalerate) was 
synthesized. Copolyesters of 3-hydroxy-5-phenylvalerate (3HPhV) and 
3-hydroxydodecanoate (3HDD) were also successfully produced by 
P. entomophila LAC23 when grown on mixtures of phenylvaleric acid and 
dodecanoic acid. Compositions of 3HPhV in P3HPhV-co-3HDD) were controlla- 
ble, ranging from 3% to 32%, depending on dodecanoic acid:5-phenylvaleric acid 
ratios [47]. Although the production of PHA with functional groups is still facing 
high costs and low productivity, the toxicity of substrates also affect the growth of 
microorganisms, and PHA with functional groups needs be produced from 
unrelated carbon sources in future studies. 


6 Engineering the Bacterial PHA Synthesis Using CRISPRi 


Clustered regularly interspaced short palindromic repeats interference (CRISPRi) 
is a powerful technology used to regulate eukaryotic genomes [96]. CRISPRi has 
also been reported to control PHA biosynthesis pathway flux and to adjust PHA 
composition. First, an F. coli strain was engineered by introducing a pathway for 
the production of P3HB4HB from glucose [18]. The native gene sad, encoding 
succinate semi-aldehyde dehydrogenase, was regulated by CRISPRi using five 
specially designed single guide RNAs (sgRNAs) for controlling carbon flux 
toward 4-hydroxybutyrate (4HB) biosynthesis in F. coli. The system allowed 
formation of P3HB4HB consisting of 1-9 mol% 4HB. Additionally, succinate, 
generated by succinyl-CoA synthetase and succinate dehydrogenase (respectively 
encoded by genes sucC, sucD, sdhA, and sdhB) was channeled preferentially to 
the 4HB precursor using selected sgRNAs such as sucC2, sucD2, sdhB2, and 
sdhAI via CRISPRi. The resulting 4HB content in P3HB4HB could be adjusted 
from 1.4 mol% to 18.4 mol% depending on the expression levels of down- 
regulated genes (Fig. 9). The results show that CRISPRi is a feasible approach 
to simultaneously manipulate multiple genes and control metabolic flux in 
E. coli [97]. 
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Fig. 9 CRISPRi as a tool to control PSHB4HB biosynthesis pathway flux and to adjust 3HB/4HB 
composition [97]. Engineered pathways for P3HB4HB synthesis by recombinant Escherichia coli. 
The CRISPRi system was used to repress gene transcription initiation and elongation in the related 
pathways. To obtain P3HB4HB consisting of various 4HB ratios, several genes can be manipu- 
lated simultaneously, including following genes: phaA beta-ketothiolase, phaB NADPH- 
dependent acetoacetyl-CoA reductase, phaC PHA synthase, sucD succinate semi-aldehyde dehy- 
drogenase, 4hbD 4-hydroxybutyrate dehydrogenase, orfZ 4-hydroxybutyrate CoA transferase 


7 Engineering the Bacterial Shapes for Enhanced 
Polyhydroxyalkanoates Accumulation 


Most bacteria have a small size ranging from 0.5 ym to 2 ym, preventing the 
bacterial cells from accumulating large amounts of inclusion bodies intracellularly, 
even though the bacteria are able to grow very fast. To overcome the size limitation, 
it is important to make bacterial cells larger. That is to say, a larger intracellular 
space is needed for more inclusion body accumulation. Various approaches were 
taken to increase the bacterial cell sizes, including deletion on actin-like protein 
gene mreB, weak expression of mreB in mreB deletion mutant, and weak expression 
of mreB in mreB deletion mutant under inducible expression of su/A, the inhibitor 
of division ring protein gene ftsZ. All of the methods resulted in different levels of 
increases in bacterial sizes and PHB granules accumulation [98]. 

MreB, the actin-like bacterial cytoskeletons, which also affects bacterial mor- 
phology, was considered a suitable engineering target for expanding the cell 
volumes [99]. When mreB was deleted, E. coli changed from rods to spherical 
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shapes, and some cells even increased their sizes to diameters of around 10 pm. 
More PHB granules were accumulated in the large E. coli JM109SG (AmreB) cells. 
However, E. coli JM109SG (AmreB) also appeared to be fragile and a fraction of 
cells ruptured during the growth stage. This phenomenon showed that MreB may 
provide critical support for maintaining the cell shape. Ectopic expression of MreB 
in a wild-type bacterium was found to interfere with normal MreB cytoskeleton 
formation, resulting in a larger cell size compared with that of a wild type. To 
increase the cell size further, the mreB gene was compensated by constitutively 
expressing mreB in a weaker manner in MreB deleted E. coli JM109SG together 
with an arabinose-inducible su/A gene encoding an inhibitory protein for the 
formation of the cell division ring (FtsZ ring), the overexpression of which leads 
to elongated cells. Remarkably, an increase of over 100% PHB accumulation was 
observed in recombinant FE. coli overexpressing mreB in an mreB deletion mutant 
under inducible expression of gene ftsZ inhibiting protein SulA (Fig. 10). The 
molecular mechanism of enlarged bacterial size was found to be directly related 
to the weakened cytoskeleton, which was the result of broken skeleton helix 
[98]. The larger E. coli cells make it possible to produce more PHA. 
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Fig. 10 Electron microscopy studies on morphology and PHB production by E. coli JM109SG 
(AmreB) overexpressing mreB [98]. (a) Schematic of PHB accumulation in E. coli JM109SG 
(AmreB) overexpressing mreB. Scale bar: 0.5 pm; (b) Growth and PHB accumulation by 
recombinants harboring pBHR68 cultivated in minimal medium at 30°C for 10 h followed by 
addition of 20 g/L glucose and another 40 h of growth. Error bars: s.d. (n = 3). E. coli JM109SG 
(pTK/pBHR68) (c) and E. coli JM109SG(AmreB/pTK-mreB/pBHR68) (d) were grown in LB 
medium at 30°C for 10 h followed by addition of 20 g/L glucose and another 40 h of growth. TEM 
on sections of cells of control E. coli JM109SG (pTK/pBHR68) (e) and E. coli JM109SG (AmreB/ 
pTK-mreB/pBHR68) (f) cultivated in the LB medium at 30°C for 10 h, followed by addition of 
20 g/L glucose and another 40 h of growth. Scale bar: 5 ym 
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8 Conclusion 


The application of PHA as a low-cost biodegradable plastic has been hampered by 
its higher production cost and the difficulty to control precisely their structures and 
properties. Global efforts have been made to develop technology for lowering the 
PHA production cost. With the successful construction of B-oxidation weakened 
Pseudomonas spp. as PHA production platforms, it is possible to control the 
formation of homopolymers and random- and block copolymers including mono- 
mer structures and ratios, and this allows us to obtain PHA with consistent prop- 
erties. At the same time, it is possible to introduce various functional groups into the 
PHA side chains in a quantitative way, which provides more opportunities for side- 
chain grafting. Functional PHA together with endless possibilities for grafting have 
provided us with limitless ways of making new PHA, possibly with some high 
value-added functionalities. With the development of synthetic biology, it also 
becomes possible to construct unnatural pathways to produce novel PHA with 
strong value-added properties. It is widely held that within 5 or 10 years, many 
novel properties including environmental responsiveness, shape memory ability, 
controllable biodegradability, and mechanical ultra-strength will be developed 
from the diverse PHA materials. Thus, with the development of synthetic biology, 
we open a new PHA golden era. 
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Engineering and Evolution of Saccharomyces 
cerevisiae to Produce Biofuels and Chemicals 


Timothy L. Turner, Heejin Kim, In Iok Kong, Jing-Jing Liu, 
Guo-Chang Zhang, and Yong-Su Jin 


Abstract To mitigate global climate change caused partly by the use of fossil 
fuels, the production of fuels and chemicals from renewable biomass has been 
attempted. The conversion of various sugars from renewable biomass into biofuels 
by engineered baker’s yeast (Saccharomyces cerevisiae) is one major direction 
which has grown dramatically in recent years. As well as shifting away from fossil 
fuels, the production of commodity chemicals by engineered S. cerevisiae has also 
increased significantly. The traditional approaches of biochemical and metabolic 
engineering to develop economic bioconversion processes in laboratory and indus- 
trial settings have been accelerated by rapid advancements in the areas of yeast 
genomics, synthetic biology, and systems biology. Together, these innovations 
have resulted in rapid and efficient manipulation of S. cerevisiae to expand fer- 
mentable substrates and diversify value-added products. Here, we discuss recent 
and major advances in rational (relying on prior experimentally-derived know]- 
edge) and combinatorial (relying on high-throughput screening and genomics) 
approaches to engineer S. cerevisiae for producing ethanol, butanol, 
2,3-butanediol, fatty acid ethyl esters, isoprenoids, organic acids, rare sugars, 
antioxidants, and sugar alcohols from glucose, xylose, cellobiose, galactose, ace- 
tate, alginate, mannitol, arabinose, and lactose. 
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1 Introduction 


As human society has grown and developed, our demand for fuels and commodity 
chemicals has accelerated. This demand has manifested as many different outputs 
for both fuels and chemicals. For fuels, we have two major categories: transporta- 
tion fuels and non-transportation fuels. Here we mainly discuss transportation fuels, 
which are currently primarily derived from non-renewable fossil fuels. These 
hydrocarbons, such as coal, petroleum, or natural gas, are processed into gasoline, 
ethanol, jet fuel, or other specialized products [1]. Approximately 80% of energy 
use by humans is derived from fossil fuels, with up to 58% consumed for transpor- 
tation [2, 3]. Because the rate of natural production of fossil fuels has for decades 
been increasingly outpaced by humanity’s usage, renewable alternatives for trans- 
portation fuels are considered a societal necessity [1]. 

As with fuels, many non-fuel chemicals are produced using non-renewable fossil 
fuel feedstocks. This petrochemical-based system is non-renewable and, as with 
fuels, an alternative method of production is needed to allow for continued 
advancement of human society. In particular, the petrochemical industry produces 
chemicals used in nearly every industry on Earth. Many bulk chemicals, such as 
ethylene and propylene, are produced in the 1-100 million annual tons range 
[4]. The specific uses of these chemicals can vary greatly: in some cases, such as 
artemisinic acid, only one major use is currently considered (as a precursor to an 
antimalarial drug) [5], whereas other chemicals, such as lactic acid, have numerous 
uses, including as a plastic precursor or as a food preservative [6]. Collectively, 
reliable and sustainable industrial production of chemicals is a necessity for ongo- 
ing human progress. 
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The finite supply of fossil fuels [7, 8], the risks associated with harvesting hard- 
to-obtain fossil fuels [9-11], and the concerns about manmade climate change 
related to fossil fuel use [12—15] have collectively pushed researchers and govern- 
ments toward producing fuels and chemicals from renewable biomass by 
engineered microbes [16, 17]. Although many microbes have been studied for the 
production of renewable fuels and chemicals, yeasts, Saccharomyces cerevisiae in 
particular, have served as major platform microbes for many of these studies. 

S. cerevisiae, also known as brewer’s yeast, is a well-studied microorganism, 
even beyond its traditional use for the production of beer and other fermented foods 
and beverages [18]. Extensive tools exist for the manipulation and engineering of 
yeasts [19-22]. These tools have allowed for harnessing the native ability of 
S. cerevisiae to grow in minimal medium, their generally recognized as safe 
(GRAS) designation, and their tolerance to low pH and acidic conditions 
[23, 24]. With these tools and inherent physiological advantages, scientific 
advances for the production of fuels and chemicals from biomass by S. cerevisiae 
have improved dramatically in recent years. In this review we discuss these recent 
developments as they relate to feedstock utilization as well as production of fuels 
and chemicals with additional insight on the future economic outlook of these 
processes. 


2 Yeast Fermentation Technologies 


With modern metabolic engineering techniques improvements following their 
advent in the 1970s and the more recent development of synthetic biology pro- 
cedures, yeast engineering technologies have grown dramatically [16]. Many yeast 
engineering approaches follow a scheme known as the “Design, Build, Test, and 
Learn” cycle [25, 26]. This scheme first requires a target outcome or goal. For 
example, a target goal could be to produce ethanol from the pentose sugar xylose by 
engineered S. cerevisiae, which natively are unable to ferment xylose. 

Once the desired outcome is determined, a parental yeast strain, often a wild- 
type strain, is selected as the target organism to be engineered. The steps for 
engineering the parental strain are as follow: (1) Designing the specific yeast 
engineering steps, including plasmids and transformation protocols, (2) Building 
the engineered strain by introduction of target genetic perturbations, (3) Testing the 
newly-developed strain, often involving fermentation and sampling, and (4) Learn- 
ing from the new strain (Fig. 1). The new knowledge obtained from this process can 
then be factored into the design of the next strain and the cycle can repeat until the 
target outcome is reached. This systematic approach has led to significant advances 
in the development of engineered S. cerevisiae capable of fermenting novel sub- 
strates for the production of fuels and chemicals. Although not all studies explicitly 
state this four-step process, the general concept is applicable in many cases. 


178 T.L. Turner et al. 


Evaluation of the engineered strains 


ae Tn N 


process 


&& Wh 


Pathway engineering for Data interpretation 
Native target chemical production and analysis Industrial application of 


S. cerevisiae _ <= the engineered strains 


Fig. 1 Schematic demonstrating the step-by-step process for the Design, Build, Test, and Learn 
metabolic engineering/synthetic biology cycle used to develop engineered Saccharomyces 
cerevisiae for industrial-scale production of renewable fuels and chemicals 
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2.1 Major Objectives and Feedstocks for Yeast 
Fermentations 


The first-generation biofuels from cornstarch or sugarcane juice have been indus- 
trialized for decades; however the food-vs-fuel conflict has limited its further 
expansion [27]. Ethanol is considered the major and most highly produced of the 
first-generation biofuels. Despite the food-vs-fuel concerns, 23.8 billion gallons of 
ethanol are produced annually, primarily from cornstarch or sugarcane juice 
[28]. Transitioning from first-generation biofuel feedstocks (cornstarch and sugar- 
cane juice) to second-generation feedstocks (lignocellulosic biomass) is a key 
objective of modern yeast fermentation research. 

The second-generation biofuels from non-food lignocellulosic biomass, which is 
a renewable carbon source, has offered an excellent opportunity to address the 
food-vs-fuel issue [29, 30]. Lignocellulosic hydrolysates obtained from corn stover 
[31], bagasse [32, 33], sorghum biomass [34], and marine plants [35, 36] after 
pretreatment and hydrolysis contain substantial amounts of hexoses (six-carbon, C6 
sugar) and pentoses (five-carbon, C5 sugar) which can be used as renewable carbon 
sources for the production of bioethanol and other value-added products (Fig. 2). 
Lignocellulosic hydrolysates are commonly composed of ~70% cellodextrins and 
glucose and 30% xylose [37], although this can vary by biomass source and 
processing protocol. Marine hydrolysate sugar compositions can vary wildly: as a 
percent of total solids, red algae hydrolysates can be composed of ~18% glucose, 
~30% total of galactose/xylose/arabinose, and ~8% mannose; green algae hydro- 
lysates can consist of ~8% glucose, 6% total of galactose/xylose/arabinose, and 5% 
mannose; finally, brown algae can be composed of 6—7% glucose and between 2% 
(Sargassum fulvellum) and 30% (Laminaria japonica) mannitol [38]. However, 
natively, the yeast S. cerevisiae cannot use pentoses, such as xylose, and cannot 
efficiently ferment all hexoses. Therefore, another major objective of yeast 


Engineering and Evolution of Saccharomyces cerevisiae to Produce. . . 179 


Fuels 
Renewable biomass-derived ae 
sugars 1 x. 
Novel Valuable 
Pharmaceuticals F 
Substrate OO QO Chemical 


Utilization Cellobiose —Xylose 


Bocas Production 


oO OO . ‘Sugar Alcohols/Rare Sugars 
Lactose . x @) 
.@) 


Galactose 


Metabolic Engineering of 
S. cerevisiae 


Fig. 2 A selection of major sugar substrates (inputs) which are processed by engineered yeast to 
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Fig. 3 A diagram of substrates which are fermentable by Saccharomyces cerevisiae via native or 
heterologous (blue text) pathways 


fermentation research is to improve the selection of sugars capable of being 
fermented by S. cerevisiae for the purpose of industrial fermentation (Fig. 3). In 
Sect. 2.2 we discuss the currently available substrates for native and engineered 
S. cerevisiae strains. 
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2.2 Native and Non-Native Substrate Utilization by 
Saccharomyces cerevisiae 


Glucose 


Glucose is the most preferred carbon source for S. cerevisiae [39] and can be 
fermented more rapidly than any other sugar. To date, no other carbon source has 
been found to be consumed more rapidly or efficiently than glucose in any wild- 
type or engineered S. cerevisiae. The major industrial source of glucose is from 
cornstarch, although wheat is also sometimes used. The first generation of biofuels 
is based on the hydrolysis of cornstarch and very high gravity (VHG) fermentations 
have been conducted to decrease the process costs [32, 40, 41]. Several studies have 
focused on enhancing the fitness of S. cerevisiae in the presence of high concen- 
trations of glucose. For example, Guadalupe-Medina et al. created a GPD1- and 
GPD2-negative S. cerevisiae that anaerobically produced ethanol at a high yield 
from glucose [42]. However, this strain became sensitive to high concentrations of 
ethanol, but the problem was alleviated by employing a laboratory evolution 
strategy with serial subculturing of the GPD//GPD2-deleted strain on ethanol 
[42]. Because glucose fermentations by S. cerevisiae are very rapid and efficient, 
further improvements for glucose fermentations by engineered S. cerevisiae are 
likely to focus on improving strain tolerance to harsh fermentation media condi- 
tions, especially those found in cellulosic hydrolysates. 


Xylose 


Harvested terrestrial biomass is processed into a sludge-like product known as a 
hydrolysate. In terrestrial biomass, hydrolysates contain both C6 and C5 sugars. 
However, the most widely-used fermenting microorganism, S. cerevisiae, cannot 
metabolize pentose sugars such as xylose and arabinose which are abundant in 
cellulosic hydrolysates. Therefore, numerous studies have attempted to construct 
metabolically engineered S. cerevisiae capable of fermenting pentose as rapidly as 
glucose [43-45]. Xylose metabolism can be introduced into S. cerevisiae using a 
bacterial or fungal metabolic route for xylose assimilation [46, 47]. The bacterial 
pathway uses only one enzyme, xylose isomerase (XI), for converting xylose into 
xylulose [44, 45]. Xylulose is later phosphorylated by xylulose kinase (XK) into 
xylulose-5-phosphate (X5P) and then enters the non-oxidative pentose phosphate 
pathway (PPP) for further metabolism toward pyruvate. Using the XI pathway, an 
ethanol yield from xylose as high as 0.45 g/g has been achieved [44]. Another study 
by Lee et al. engineered an S. cerevisiae to harbor a bacterial xylose pathway to 
express a mutant xylose isomerase (xy/A3*) from Piromyces sp. with aldose 
reductase (GRE3) and PHOJ3 deletions coupled with overexpression of the 
S. cerevisiae native xylulokinase (XKS/) and S. stipitis transaldolase (TAL/) 
[44]. Zhou et al. also overexpressed the Piromyces sp. xylose isomerase gene 
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(XYLA), S. stipitis xylulose kinase (XYL3), and genes of the non-oxidative pentose 
phosphate pathway [45]. 

The fungal xylose assimilation pathway consists of two oxidoreductases, 
NADPH-linked xylose reductase (XR) and NAD-linked xylitol dehydrogenase 
(XDH) [43]. Several researchers developed platforms for consuming these specific 
substrates, such as introducing xylose-metabolizing enzymes into S. cerevisiae to 
produce a rapid and efficient xylose-fermenting strain [47-50]. For example, Kim 
et al. introduced the fungal pathway by strong and balanced expression of genes 
from Scheffersomyces stipitis consisting of xylose reductase (XR, encoded by 
XYL1), xylitol dehydrogenase (XDH, encoded by XYL2), and xylulose kinase 
(XK, encoded by XYL3) with the addition of the genetic disruption of alkaline 
phosphatase (PHO/3) and acetaldehyde dehydrogenase (ALD6) [43]. This series of 
genetic manipulations using the fungal XR/XDH/XK pathway resulted in an etha- 
nol yield of 0.35 g/g from xylose [43]. 

Collectively, these studies have developed numerous xylose-fermenting 
S. cerevisiae capable of rapid and efficient xylose fermentation. Despite these 
advances, even the fastest xylose fermentations by engineered yeasts are still slower 
than the fastest glucose fermentations, and so further studies to improve xylose 
fermentation rates and yields by S. cerevisiae are ongoing. 


Arabinose 


Similar to xylose metabolism, different L-arabinose metabolizing pathways have 
been identified in bacteria [51] and fungi [52, 53]. The bacterial pathway for L- 
arabinose utilization converts L-arabinose into X5P via L-ribulose-5-phosphate 
(LSP) using three enzymes (an isomerase, a kinase, and an epimerase). When L- 
arabinose isomerase (araA), L-ribulokinase (araB), and L-ribulose-5-phosphate 
4-epimerase (araD) from Lactobacillus plantarum were expressed in 
S. cerevisiae, L-arabinose fermentation was observed [51]. The fungal L-arabinose 
utilization pathway converts L-arabinose into L-arabinitol by aldose reductase 
(GRE3 from S. cerevisiae or XYL1 from Scheffersomyces Sstipitis), L-xylulose by 
L-arabinitol 4-dehydrogenase (LAD from Trichoderma reesei), xylitol by L-xylulose 
reductase (LXR from T. reesei), D-xylulose by (XDH from S. stipitis), and lastly XSP 
by xylulokinase (XYL3) [52, 53]. As X5P is a gateway metabolite in the PPP, it can 
be converted to pyruvate and ethanol. Recently, researchers have used codon 
optimized bacterial pathways for L-arabinose fermentation in S. cerevisiae because 
of the inefficient L-arabinose utilization and high byproduct (L-arabinitol) yield of 
fungal pathways caused by severe redox imbalance [54]. 

Other than introducing xylose and arabinose pathways into S. cerevisiae, known 
hexose transporters, as potential xylose and arabinose transporters, have been 
investigated. Several hexose transporters were proven to be responsible for the 
uptake of pentose sugars. For instance, Hxt7p, Hxt5p, and Gal2p improve xylose 
uptake [55] and Gal2p also facilitates the transport of L-arabinose [56]. However, 
these hexose transporters exhibited very low affinity to pentoses and preferred 
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p-glucose. Therefore, for the improvement of xylose and L-arabinose fermentations, 
efforts were made to find high-affinity xylose or L-arabinose specific transporters. 
Heterologous transporters were discovered with higher affinity for xylose over 
glucose, such as Gxslp from Candida intermedia [57], Xut3p from S. stipitis 
[58], and Mgt05196p from Meyerozyma guilliermondii [59]. Nonetheless, it is 
still challenging to have both the specificity and efficiency of xylose transport, 
and further evolutionary adaptation and protein engineering are required 
[59, 60]. Heterologous overexpression of STP2 from Arabidopsis thaliana and 
ARAT from S. stipitis in S. cerevisiae also led to improved anaerobic L-arabinose 
fermentation, especially at low L-arabinose concentrations, although these two 
transporters still are inhibited in the presence of glucose [61]. Recently, Wang 
et al. have engineered an S. cerevisiae strain capable of producing an ethanol yield 
of 0.43 g/g from arabinose, one of the highest reported yields to date [62]. 


Cellobiose 


Another major sugar of interest is cellobiose, a B(1,4)-linked dimer of D-glucose, 
which is readily released from larger cellodextrins from cellulose by cellulases after 
acidic treatment of terrestrial biomass [63]. However, S. cerevisiae cannot naturally 
metabolize cellobiose because of the lack of a cellobiose transporter and intracel- 
lular B-glucosidase. A high-affinity cellodextrin transporter (cdt-/ or cdt-2) and 
intracellular B-glucosidase (gh/-/) were identified from the cellulolytic fungus 
Neurospora crassa [64]. The cellobiose transporters and the intracellular 
B-glucosidase promote efficient cellobiose fermentation and ethanol production 
when expressed in S$. cerevisiae [64]. The intracellular B-glucosidase can be 
replaced by cellobiose phosphorylase, which produces glucose and glucose-1- 
phosphate from cellobiose. Efficient cellobiose fermentation by engineered yeast 
expressing a cellobiose transporter and a bacterial cellobiose phosphorylase has 
also been demonstrated [65]. Because cellobiose does not induce glucose inhibition 
on other carbon sources, simultaneous cofermentation of cellobiose and xylose 
[66, 67] as well as cellobiose and galactose [68] has been achieved. Simultaneous 
cofermentation is necessary for efficient and rapid industrial-scale fermentation of 
hydrolysates. 


Alginate and Mannitol 


Another type of sustainable non-lignocellulosic biomass is marine biomass, such as 
macroalgae or seaweed. The most abundant sugars in brown macroalgae are 
alginate, mannitol, and glucan (presented as laminarin or cellulose). However, 
industrial microbes are unable to metabolize the alginate, which represents 
30-60% of total sugars in brown macroalgae. Alginate is a linear block copolymer 
of two uronates, B-p-mannuronate (M) and a-L-guluronate (G), arranged in varying 
sequences [69]. Some microbes can metabolize alginate natively by 


Engineering and Evolution of Saccharomyces cerevisiae to Produce... 183 


depolymerization of alginate into oligomers by alginate lyases (Alys). These 
oligomers are further degraded into unsaturated monomers by oligoalginate lyase 
(Oal) and the monomers are rearranged spontaneously into 4-deoxy-L-erythro-5- 
hexoseulose uronic acid (DEH). DEH is then converted into 2-keto-3-deoxyl- 
glucontate (KDG) by DEH reductase (DehR), and KDG is a common metabolite 
that can enter into the Entner—Doudoroff (ED) pathway and yield pyruvate and 
glyceraldehydrate-3-phosphate via KDG kinase (KdgK) and KDG-6-aldolase 
(Eda). 

However, these natural microbes, such as Sphingomonas sp., lack the robustness 
necessary for industrial fermentation conditions and have limited availability of 
genetic and metabolic engineering tools. Therefore, researchers have introduced 
and expressed the genes responsible for the alginate degradation, transport, and 
metabolism into the well-characterized microorganism Escherichia coli, which is 
naturally capable of utilizing mannitol and p-glucose. A 36-kb pair DNA fragment 
from Vibrio splendidus encoding enzymes necessary for alginate degradation, 
transport, and metabolism was discovered. After introducing the alginate metabo- 
lism, the heterologous homoethanol pathway consisting of Zymomonas mobilis 
pyruvate decarboxylase (pdc) and alcohol dehydrogenase B (adhB) were also 
introduced for efficient ethanol production [70], which later demonstrated the 
feasibility of utilizing macroalgae as a microbial host for ethanol production. 

Although engineered E. coli provided the proof of concept for metabolizing 
alginate, mannitol, and p-glucose, S. cerevisiae is a more amenable host for 
industrial-scale ethanol production. Therefore, Enquist-Newman et al. attempted 
to re-engineer the alginate and mannitol catabolic pathways into S. cerevisiae 
[71]. They discovered an alginate monomer (DEHU) transporter from the 
alginolytic eukaryote Asteromyces cruciatus. Through the genome integration and 
overexpression of this transporter and with the necessary bacterial alginate degra- 
dation genes and essential genes for mannitol consumption, including an NAD*- 
dependent mannitol-2-dehydrogenase (M2DH) and a mannitol transporter, the 
engineered S. cerevisiae was able to metabolize DEHU and mannitol [71]. As a 
result, the engineered S. cerevisiae strain produced ethanol from mannitol and 
DEHU at 83% of the maximum theoretical yield. 


Galactose 


S. cerevisiae are naturally capable of fermenting galactose, a C6 monosaccharide, 
into ethanol through the Leloir pathway. In the Leloir pathway, galactose is 
converted to UDP-glucose and then glucose-1-phosphate. Phosphoglucomutase 
converts glucose 1-phosphate to glucose 6-phosphate. Whereas the rest of the 
metabolic pathway is identical, the ethanol yield and productivity from galactose 
by S. cerevisiae is significantly lower than from glucose [72]. Through 
overexpression of a truncated TUP/ gene, which codes for a general transcription 
repressor, Lee et al. were able to improve galactose consumption rate and ethanol 
productivity by 250% compared to a control S. cerevisiae [72]. By combining 
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enhanced galactose metabolism with a heterologous cellobiose pathway, an 
engineered S. cerevisiae could be employed for fermenting red seaweed hydroly- 
sates. The major components of red seaweed (Gelidium amansii), cellulose and 
galactan, can be hydrolyzed to produce a mixture of cellobiose and galactose 
[36, 68]. Cellobiose and galactose can be cofermented by engineered yeast [68] 
because the two sugars are transported with high affinity by independent trans- 
porters (CDT-1 and Gal2). Recently, one research group has focused on using high 
concentrations of galactose as an adaptation pressure on yeast to improve galactose 
consumption rates and ethanol productivity [73, 74]. Further improvements for 
producing ethanol from galactose and red seaweed are necessary for industrial- 
scale ethanol production, especially as demand for second-generation biofuels 
continues to grow. 


Acetate 


Acetate is one of the major inhibitors present in lignocellulosic hydrolysates which 
can hamper S. cerevisiae fermentation capabilities. In addition, acetate is also 
produced as a major component from the pyrolysis of lignin [75, 76]. Recently, 
an interesting solution was developed to convert acetate from a fermentation 
component or inhibitor into a valuable product. By coupling the consumption of 
acetate and xylose, the redox imbalance of xylose fermentation by the heterologous 
xylose reductase (XR), xylitol dehydrogenase (XDH), and xylulokinase 
(XK) pathway can be alleviated and the inhibitor (acetate) can be detoxified 
[77]. As a major result, the entire bioethanol fermentation process was improved 
compared to the control, increasing the ethanol yield by 6% (to 0.414 g/g) and 
reducing byproduct formation by 11% [77]. This process was further advanced by 
generating an engineered S. cerevisiae which expresses a cellobiose-utilizing path- 
way in addition to the aforementioned acetate and xylose pathways, allowing for 
efficient fermentation of multiple lignocellulosic sugars (xylose and cellobiose) and 
fermentation inhibitors (acetate) [78]. Finally, a peak ethanol yield of 0.463 g 
ethanol/g xylose was achieved by an XR/XDH-expressing S. cerevisiae through 
upregulation of acetylating acetaldehyde dehydrogenase (AADH) and acetyl-CoA 
synthetase (ACS) [79]. Compared to the control strain, the engineered strain was 
able to produce 18.4% more ethanol, 41.3% less glycerol, and consume 4.1 g/L of 
acetate from a cellulosic hydrolysate [79]. Collectively, these acetate-utilization 
studies are a significant breakthrough for the in situ detoxification of acetate by 
S. cerevisiae for ethanol production. Additional improvements could convert this 
fermentation into an industrial-scale ready process. 


Lactose 


Lactose is a disaccharide consisting of the monomers glucose and galactose. The 
primary source of lactose is from milk or fermented dairy products. Annually, 
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millions of tons of lactose are produced by the dairy industry. As a result of the acid 
whey fermentation process, a significant amount of lactose is trapped in the harsh 
and acidic acid whey slurry. Many studies have been conducted to find efficient 
uses for this trapped lactose. 

Several studies have attempted to create a lactose-consuming S. cerevisiae by 
introducing LAC4 and LAC12 from Kluyveromyces marxianus and Kluyveromyces 
lactis into S. cerevisiae [80-84]. These studies resulted in the development of 
engineered S. cerevisiae capable of fermenting lactose. By expressing the LAC4 
and LAC/2 genes into the MIGI and NTH/ gene-encoding regions in S. cerevisiae, 
respectively, Zou et al. engineered a strain capable of producing 63.3 g/L of ethanol 
from approximately 150 g/L lactose in 120 h from concentrated cheese whey 
[84]. By disrupting the function of the M/JGI and NTHI genes, the engineered 
strain had highly reduced glucose repression. Although Kluyveromyces spp. are 
yeasts which can natively ferment lactose, their genetics are not as well-understood 
as that of S. cerevisiae, suggesting that improvements of S. cerevisiae for lactose 
fermentation may be ideal. 


3 Biofuel Production by Engineered or Evolved Yeast 


S. cerevisiae offers many advantages for producing sustainable and economically 
viable biofuels from renewable feedstocks. It has been widely used as an important 
eukaryotic model for fundamental molecular biology research with numerous 
synthetic biology tools developed as compared to most other microorganisms, 
perhaps second only to E. coli. Recently developed yeast engineering tools include 
the use of zinc-finger nucleases [85], yeast oligo-mediated genome engineering 
[21], and most notably the clustered regularly-interspaced short palindromic repeats 
(CRISPR)/CRISPR-associated protein (Cas9) system) [22]. However, S. cerevisiae 
is considered to be more robust than E. coli, with S. cerevisiae possessing a higher 
tolerance to low pH/high acid conditions, resulting in preference for the eukaryote 
for fermentation of biomass hydrolysate. S. cerevisiae has been used extensively as 
a platform cell factory for first-generation, industrial-scale bioethanol production 
[86]. Because of its unique robustness toward harsh fermentation conditions and the 
substantial availability of yeast engineering tools, introducing new metabolic 
engineering pathways into S. cerevisiae has been used for producing alternative 
products beyond bioethanol (Fig. 4 and Table 1). 
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Fig. 4 A diagram of fuels that can be produced by Saccharomyces cerevisiae via native or 
heterologous (blue text) pathways 


3.1 Biofuel Production by S. cerevisiae 
Ethanol 


First-generation biofuel production focused almost entirely on producing 
bioethanol from corn or sugarcane juice. Although many research directions were 
investigated to improve bioethanol production, one major direction focused on 
glycerol, a common byproduct of ethanol fermentations. During anaerobic yeast 
fermentations, the biosynthesis of proteins, nucleic acids, and lipids from biomass 
production generate excess cytosolic reduced redox cofactors such as NADH. 
Formation of glycerol serves as an essential electron sink for oxidizing NADH 
into NAD* in the cytosol. Tremendous research efforts focused on minimizing the 
formation of the unwanted glycerol byproduct generated during the bioethanol 
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Table 1 Biobased fuels from Saccharomyces cerevisiae 
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Product 


Substrate 


Result 


Genetic modification(s) 


Reference 


Ethanol Glucose 97.8% tm FPSIAGPD2A and 80-bp 3’ [39] 
truncation of GPD/ native 
promoter 
Ethanol Xylose 0.35 g/g xylose S. stipitis XYLI, XYL2, and XYL3 | [43] 
balanced expression and 
PHOI13AALD6A 
Ethanol Arabinose | 0.43 g/g arabinose __| L. plantarum araA, araB, and [62] 
araD expression and 
overexpression of TAL/, TKL1, 
RPE1, RKII, and GAL2 with 
adaptive evolution 
Ethanol Cellobiose | 86.3% tm N. crassa cdt-I and ghl-1 [64] 
integration 
Ethanol Xylose and | 0.39 g/g xylose and | S. stipitis XYL1, XYL2, and XYL3 | [66] 
cellobiose | cellobiose and N. crassa cdt-1 and ghl-1 
balanced expression 
Ethanol Cellobiose | 0.36 g/g galactose | N. crassa cdt-1 and ghl-1 [68] 
and and cellobiose integration 
galactose 
Ethanol Mannitol 83% tm A. cruciatus YELO70W/ [71] 
and DEHU YNRO73C, HXT13, HXT1I7, and 
YNRO7IC expression 
Ethanol Galactose | 0.46 g/g galactose _| Laboratory evolution on [73] 
galactose 
Ethanol Acetate 6% improved yield | E. coli adhE integration with [77] 
and xylose | and 11% reduced S. stipitis XYL1, XYL2, and XYL3 
byproduct balanced expression and 
formation PHOI3AALD6A 
Ethanol Acetate, ~9% improved E. coli adhE and N. crassa cdt-1 | [78] 
xylose, and | yield and gh1-/ integration with 
cellobiose S. stipitis XYL1, XYL2, and XYL3 
balanced expression and 
PHOI13AALD6A 
Ethanol Glucose 10% improved GLNI1 and GLT/ overexpression | [90] 
yield and GDHIA 
Ethanol Glucose 97.4% tm B. cereus gapN, E. coli frdA, and | [94] 
mhpF expression 
Ethanol Glucose 10% improved FPSJA reducing glycerol [95] 
yield production 
Ethanol Glucose Tolerance up to Native GPD/ and GPD2 pro- [96] 
90 g/L EtOH in moters replaced with lower- 
wheat liquefact SSF | strength TEF/ promoter mutants 
in GPDIA or GPD2A strains 
1-Butanol | Galactose Tenfold increase C. beijerinckii adhe2, hbd, crt, [98] 
with S. cerevisiae ERG10, and 
S. collinus ccr expression 
1-Butanol | Glucose 16.3 mg/L titer T. denticola Ter and S. enterica | [99] 


ACS2 expression, ADH2 and 
ALD6 overexpression, and 
MLSIACIT2A 
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Table 1 (continued) 


Product Substrate Result Genetic modification(s) Reference 

1-Butanol | Glucose 120 mg/L titer E. coli PDH genes and acetyl-CoA | [100] 
synthetase gene expression with 
ADHIAADH4AGPDIAGPD2A 


1-Butanol | Glucose 242.8 mg/L titer Leucine biosynthesis pathway {101] 


overexpression and 


ILV2AADHI1A 

Isobutanol | Glucose 4.12 mg/g ILV2, ILV3, ILV5, and BAT2 [102] 
overexpression 

Isobutanol | Glucose 6.40 mg/g Located isobutanol pathway into | [103] 


the mitochondria 


Isobutanol | Glucose 15 mg/g 5-Integration used to assemble [104] 
isobutanol pathway genes into 
the yeast chromosome 
Isobutanol | Glucose 1.62 g/L titer PDH complex activity reduction | [106] 
LPDIA and transhydrogenase- 
like shunt expression 


FAEE Glucose 6.3 mg/L titer M. hydrocarbonoclasticus wax [109] 
ester synthase expression 
FAEE Glucose 6.3-fold increase M. hydrocarbonoclasticus wax {112] 


ester synthase, S. cerevisiae 
FAAI, and B. ammoniagenes 
bafas and ppt expression 


tm theoretical maximum 


production process because carbon directed toward glycerol reduced carbon avail- 
ability for ethanol synthesis. Two structural genes encoding cytosolic NADH- 
dependent glycerol-3-phosphate dehydrogenases, GPD/ and GPD2, play important 
roles in redox balance and osmoregulation. These genes are also both induced under 
high osmotic conditions and during anaerobic fermentation. Glycerol formation can 
be reduced by deleting one or both genes [87]. However, yeast cells with this double 
deletion of GPD/ and GPD2 become unable to grow anaerobically because of the 
lack of alternative pathways to oxidize NADH. The single deletion of GPD2 
showed improved ethanol yields by decreasing glycerol production, but the deletion 
also hindered cell growth and ethanol productivity [88]. Reduced glycerol produc- 
tion also increased the osmosensitivity and diminished the general robustness of the 
engineered yeast [89]. 

Other studies have focused on the metabolic engineering of the cellular redox 
metabolism. An ammonium assimilation pathway that consumes NADH and ATP 
was utilized. Deletion of the NADPH-dependent glutamate dehydrogenase gene 
GDH] and respectively co-overexpression of the glutamate synthase gene GLT/ 
and the glutamine synthetase gene GLN/ showed a significant reduction in glycerol 
by 38% and improved ethanol yield by 10% [90]. Another alternative way of redox 
engineering would be the reduction of the surplus cytosolic NADH with lower ATP 
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production by replacing the natural glyceraldehyde-3-phosphate dehydrogenase 
with a non-phosphorylating, NADP*-dependent glyceraldehyde-3-phosphate dehy- 
drogenase (gapN) from Streptococcus sp. mutants or Bacillus cereus [91, 92]. One 
interesting demonstration of cofactor metabolism is that overexpression of E. coli 
mhpF was able to restore the anaerobic growth of a GPD/- and GPD2-deleted 
mutant under the presence of acetate by re-oxidizing the NADH through the 
reduction of acetic acid to ethanol [93]. By combining these genetic modifications, 
overexpression of the NAD*-dependent fumarate reductase frdA or NAD*-depen- 
dent acetaldehyde dehydrogenase mhpF from E. coli with gapN can improve the 
ethanol yield to above 97% of the maximum theoretical yield compared to wild- 
type yeast [94]. Furthermore, gapN expression with the combination of TPS/ and 
TPS2 overexpression showed reduced glycerol production and improved ethanol 
yield [92]. By blocking the export of glycerol through deletion of FPS/ encoding a 
glycerol facilitator, yet another method to reduce glycerol production and improve 
ethanol yield was uncovered [95]. 

Promoter engineering has been used as an alternative approach to modulate the 
expression of GPD/ and GPD2. For example, S. cerevisiae mutants with the lower- 
strength TEF/ promoter replacing the native GPD/ and/or GPD2 promoters pro- 
duced less glycerol and more ethanol without reducing the robustness of the host 
strain toward osmotic stress [96]. With the FPS/- and GPD2-deleted yeast strain 
background (KAM1S5 strain), the mutants with 3’ truncation of the GPD/ promoter 
by 20-, 60-, or 80-bp displayed varied expression strength of GPD/ and had an 
unaffected osmotic response. The glycerol production by the engineered yeast was 
also reduced by 16% and 31% in mutants with 60- and 80-bp truncated promoters, 
respectively, in high-gravity (VHG) fermentations. The ethanol yield reached 
0.499 g/g in the mutant with an 80-bp truncated promoter [40]. 


1-Butanol 


Higher-chain alcohols provide higher energy density and are considered as poten- 
tial next-generation gasoline substitutes. One of the primary target alcohols is 
1-butanol. Although 1-butanol was traditionally produced from Clostridium species 
through the acetone-butanol-ethanol (ABE) fermentation process, or by engineered 
E. coli with a titer up to 30 g/L [97], there are several advantages of using 
S. cerevisiae for 1-butanol production. In addition to the general robustness of 
S. cerevisiae toward fermentation inhibitors and low pH, S. cerevisiae also does not 
have phage contamination issues and has better resistance to high 1-butanol con- 
centrations. However, only a low concentration of 1-butanol was produced from the 
native 1-butanol metabolic pathway in S. cerevisiae, which prompted several labs 
to look for heterologous pathways to improve 1-butanol production. 

Steen et al. introduced and expressed in S. cerevisiae several isozymes from 
different organisms to create a biosynthetic 1-butanol pathway with a peak 
1-butanol titer of 2.5 mg/L [98]. This pathway consisted of converting acetyl- 
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CoA into acetoacetyl-CoA, which was reduced to 3-hydroxybutyryl-CoA, and later 
crotonyl-CoA. Butyryl-CoA is the reduced form of crotonyl-CoA which is later 
further reduced into butyraldehyde and finally reduced into 1-butanol. This pathway 
consisted of overexpression of a thiolase (ERG1O) from S. cerevisiae, an NADH- 
dependent 3-hydroxybutyryl-CoA dehydrogenase (hbd) and crotonase (crt) from 
Clostridium beijerinckii, an NADH-dependent crotonyl-CoA reductase (ccr) from 
Streptomyces collinus and butanol dehydrogenase (adhe2) from 
C. beijerinckii [98]. 

Krivoruchko et al. initially increased 1-butanol titers up to 6.6 mg/L by engi- 
neering yeast with higher flux toward cytosolic acetyl-CoA, which is the precursor 
for 1-butanol biosynthesis in addition to the overexpression of the heterologous 
enzymes for the 1-butanol biosynthetic pathway as follow [99]. First, NADH- 
dependent crotonyl-CoA-specific trans-enoyl-CoA reductase (Ter) from Trepo- 
nema denticola replaced the ccr to avoid the reverse oxidation of butyryl-CoA to 
crotonyl-CoA. Second, to increase the cytosolic acetyl-CoA supply, a pyruvate 
dehydrogenase (PDH) bypass was created by overexpression of endogenous alco- 
hol dehydrogenase (ADH2), NADP-dependent aldehyde dehydrogenase (ALD6), 
codon-optimized acetyl-CoA synthetase (ACS2) from Salmonella enterica, and 
acetyl-CoA acetyltransferase (ERG/O). Lastly, deletion of malate synthase 
(MLS1) or citrate synthase (CIT2) reduced the drainage of acetyl-CoA through 
the glyoxylate pathway, and the 1-butanol titer increased to 16.3 mg/L [99]. 

Therefore, intracellular availability of cytosolic acetyl-CoA is considered an 
important factor for 1-butanol production in yeast. NADH availability could also be 
a strong driving force toward 1-butanol production. Therefore, NADH-dependent 
alcohol dehydrogenase (ADH) and glycerol-3-phosphate dehydrogenase (GPD) can 
be deleted to increase the NADH availability and reduce the unwanted byproducts 
such as ethanol and glycerol. Lian et al. produced up to 120 mg/L 1-butanol by 
inactivating ADH and GPD, introducing the butanol biosynthesis pathway genes, 
and, most importantly, introducing a PDH-bypass pathway, cytosolic localized 
PDH, and ATP-dependent citrate lyase (ACL) [100]. 

Despite the limited accumulation of 1-butanol from the native S. cerevisiae 
pathway, some researchers have focused on improving the native pathway by 
focusing on threonine catabolism. Si et al. utilized genes from leucine biosynthesis 
(LEU1, LEU2, LEU4, and LEU), together with threonine deaminase genes (LV 1/ 
CHA1), 2-keto acid decarboxylases (KDCs) from Lactococcus lactis, and alcohol 
dehydrogenases (ADHs) from S. cerevisiae [101]. The pathway consists of many 
steps, starting with L-threonine to 2-ketobutyrate to 2-ketovalerate, and so forth, 
eventually ending at 1-butanol. Deletion of ADH allowed the engineered 
S. cerevisiae to produce more than 120 mg/L of 1-butanol from glucose in a 
complex yeast-peptone medium. By amplifying the leucine biosynthesis pathway 
via overexpression of several key genes and eliminating the competing pathways, 
the highest reported 1-butanol titer of 242.8 mg/L in S. cerevisiae with ADH1- and 
ILV2-deletions was achieved [101]. 
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Isobutanol 


Isobutanol is another example of a target alcohol which has a higher energy density 
than ethanol. The isobutanol biosynthesis pathway is closely linked to the biosyn- 
thesis of branched-chain amino acids via the Ehrlich pathway. 2-Ketoisovalerate 
(KIV), an intermediate of valine biosynthesis, is decarboxylated to 
isobutyraldehyde by 2-ketoacid decarboxylase (KDC) and later reduced into 
isobutanol by alcohol dehydrogenase (ADH). However, the protein synthesis of 
KIV occurs in the yeast mitochondria whereas the other two enzymes, Kdc and 
Adh, are found in the yeast cytosol. For isobutanol synthesis in S. cerevisiae, 
pyruvate must transfer into mitochondria and then KIV must be transported into 
the cytosol. 

The first report for isobutanol overproduction in yeast utilized simultaneous 
overexpression of endogenous genes (/LV2, ILV3, and ILV5) of the mitochondrial 
valine biosynthesis pathway. The resulting strain produced isobutanol with a yield 
up to 0.97 mg isobutanol/g glucose in minimal medium [102]. Additional 
overexpression of the cytosolic branched-chain amino acid aminotransferase 
(BAT2) increased the isobutanol yield up to 3.86 mg/g glucose [102]. Finally, a 
yield of 4.12 mg/g glucose was achieved by the engineered yeast in an aerobic 
condition with complex yeast-peptone medium [102]. Avalos et al. demonstrated 
that locating the complete isobutanol pathway into the mitochondria resulted in 
substantial increases in isobutanol as compared with the native pathway which is 
split between the cytosol and the mitochondria. KDCs and ADHs_ were 
overexpressed in the cytosol or imported into mitochondria by fusing them with 
an N-terminal targeting signal, and the isobutanol yield reached up to 6.40 mg/g 
glucose with a titer up to 0.635 g/L [103]. This study suggested that the availability 
of the KIV intermediate and the increased local enzyme concentration would be 
beneficial for isobutanol production. Another research group, Yuan and Ching, 
developed a similar approach with a 6-integration system to assemble the genes into 
the yeast chromosomes with the resulting isobutanol yield up to 15 mg/g 
glucose [104]. 

The opposite strategy is to relocate the pathway into the cytosol. By 
re-localization and codon-optimization of the mitochondrial valine synthesis 
enzymes together, along with the overexpression of decarboxylase (AROJO) and 
alcohol dehydrogenase (ADH2) genes, isobutanol production was improved to the 
highest titer of 0.63 g/L and a yield of approximately 15 mg/g glucose 
[105]. Isobutanol production was further improved in engineered S. cerevisiae by 
two strategies. First, the elimination of competing pathways by deletion of a 
pyruvate dehydrogenase complex component (LPD/) to avoid competing with 
acetyl-CoA biosynthesis in the mitochondria. Second, resolving cofactor imbalance 
by the implementation of the transhydrogenase-like shunt, which pyruvate cycli- 
cally converted into oxaloacetate, malate, and back to pyruvate causing simulta- 
neous conversion of NADH to NADPH. The final isobutanol titer reached 1.62 g/L 
and a yield of 16 mg/g glucose [106]. However, even this heightened result is still 
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considerably below that of engineered E. coli, reported to generate isobutanol titers 
up to several grams per liter [107]. These results suggest that considerable improve- 
ments are necessary before yeast-based isobutanol production can be competitive 
on an industrial scale. 


Fatty Acids 


Fatty acids (FAs) and lipids are also valuable chemicals for numerous industrial 
applications. Lipids are condensed from a glycerol-3-phosphate backbone with the 
completed FA synthesized from acetyl-CoA. Fatty acid ethyl esters (FAEEs) can be 
used for diesel or jet fuel production. FAEEs can be formed by esterification of fatty 
acyl-CoAs and ethanol. Kalscheuer et al. first studied FAEE production in yeast 
[108] by heterologous expression of an unspecific bacterial acyltransferase, a wax 
ester synthase/acyl-coenzyme A: diacylglycerol acyltransferase (WS/DGAT), from 
Acinetobacter calcoaceticus ADP 1. Later, Shi et al. screened five different wax 
ester synthases in S$. cerevisiae and found the wax ester synthase from 
Marinobacter hydrocarbonoclasticus performed best with the highest titer of 
FAEE at 6.3 mg/L [109]. Overexpression of acetyl-coA carboxylase (ACC/) led 
to an increase of FAEE titer to 8.2 mg/L [109]. de Jong et al. continued the study by 
increasing the acyl-CoA synthesis which later enhanced the production of FAEE by 
increasing the NADPH and acetyl-CoA pools in two ways [110]. First, 
overexpression of alcohol dehydrogenase (ADH2), acetaldehyde dehydrogenase 
(ALD6), and a heterologous acetyl-CoA synthase variant from Salmonella enterica 
(acssp°"'”) was conducted to re-channel the carbon flow for acetyl-CoA with the 
ethanol degradation pathway. Wax ester synthase from M. hydrocarbonoclasticus 
was also overexpressed. Second, a phosphoketolase pathway was established by 
overexpression of xpkA and ack from Aspergillus nidulans for the conversion of 
xylulose-5-phosphate to acetyl-phosphate and glyceraldehyde-3-phosphate and 
acetyl phosphate to acetate. The resulting engineered S. cerevisiae strain proved 
to have a 1.7-fold improvement for FAEE production compared to the control 
strain, with 5.1 mg/g dry cell weight [110]. 

In the same year, Valle-Rodrigez et al. eliminated the non-essential fatty acid 
utilization pathway such as steryl esters (SEs) and triacylglycerols (TAGs) by 
deletion of DGA/, LROI, AREI, and ARE2 [111]. The researchers also deleted 
POX! to avoid degradation of FAs and overexpressed wax ester synthase 
(WS) from M. hydrocarbonoclasticus DSM 8798 which generated a final FAEE 
titer of up to 17.2 mg/L [111]. Recently, Eriksen et al. investigated the heterologous 
expression of Type-I fatty acid synthase (FAS) from Brevibacterium 
ammoniagenes coupled with WS/DGAT [112]. They found the strain harboring 
the orthologous FAS yielded a 6.3-fold increased FAEE titer compared to strains 
without FAS. The FAEE titer was 10.498 mg/g DCW with the overexpression of 
Type-I fatty acid synthase (bafas and pptl) from Brevibacterium ammoniagenes, 
FAAI from S. cerevisiae, and wax ester synthase from M. hydrocarbonoclasticus 
{112]. However, additional studies and demonstrations must be conducted for 
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further improvement of the titers for FAEE, because the above-mentioned titers 
from engineered S. cerevisiae are still relatively low for industrial applications. 


4 Chemical Production by Engineered or Evolved Yeast 


There has been an intensive effort for the engineering of S. cerevisiae to produce 
non-fuel, value-added chemicals. Historically, S. cerevisiae has been used for 
ethanol production by the food or fuel industries, but scientific advances for the 
purpose of ethanol production by yeast can often easily be applied to non-fuel 
production. As mentioned in previous sections of this review, S. cerevisiae has 
GRAS status and their genetic system has been studied heavily. Thus, many genetic 
tools are available [21, 22, 85] which ease the engineering of this host organism to 
produce nonconventional target products. These products include food additives, 
pharmaceuticals, advanced biofuels, and valuable chemicals for industrial 
applications. 

Natively, S. cerevisiae produces numerous minor and major intermediates and 
metabolites, especially those throughout the glycolytic pathway, the pentose phos- 
phate pathway, and the tricarboxylic acid pathway [113]. However, to accumulate a 
significant concentration of these intermediates (or other, non-native compounds) 
for industrial purposes, considerable engineering or evolution of S. cerevisiae is 
often necessary. Methods, such as the Design, Build, Test, and Learn approach 
(Fig. 1) or tools such as CRISPR/Cas9 [22] have been largely applied for the 
purpose of producing ethanol by yeast fermentations, but can be and have been 
easily re-tooled for constructing yeast capable of producing many other chemicals. 
These chemicals cover many broad categories including isoprenoids, fatty acids, 
organic acids, rare sugars, sugar alcohols, and others. A recent tour de force of 
S. cerevisiae engineering came from Galanie et al., in which the group required 
23 enzymes from bacteria, mammals, plants, and yeast to produce a tiny amount of 
opioids, albeit at roughly five orders of magnitude below what would be necessary 
for industrial scale-up [114]. However, this demonstrates a future for yeast bio- 
technology in which a single biosynthetic pathway can create downstream products 
that may otherwise take multiple chemical catalysis steps (Fig. 5 and Table 2). 


4.1 Chemical Production by S. cerevisiae 
2,3-Butanediol 


2,3-Butanediol (2,3-BD) is an increasingly popular target chemical because of its 
wide applications for synthesizing diverse products such as pharmaceuticals, cos- 
metics, and industrial solvents. As 2,3-BD is mostly produced by pathogenic 
bacteria, it is difficult to apply the bacteria to industrial fermentations. 
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Fig. 5 A diagram of non-fuel chemicals that can be produced by Saccharomyces cerevisiae via 
native or heterologous (blue text) pathways 


S. cerevisiae can produce 2,3-BD naturally, but at a very low concentration, 
because of ethanol serving as the major fermentative end product. Therefore, 
researchers have engineered S. cerevisiae to generate a higher titer of 2,3-BD by 
the elimination of ethanol production through the disruption of alcohol dehydroge- 
nases (ADH1, ADH3, and ADHS). Ng et al. achieved a titer of 2.29 g 2,3-BD/L with 
a yield of 0.113 g/g glucose [115]. Kim et al. further eliminated the competing 
pathways by deleting all three pyruvate decarboxylase genes (PDC1, PDCS, and 
PDC6) and generated a Pdc-deficient mutant to improve the 2,3-BD titer 
[116]. However, Pdc-deficient mutants had defects such as slow growth, and they 
required acetate or ethanol supplementation as a carbon source. The Pdc-deficient 
mutants also suffered from redox imbalance because of glucose repression. The 
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Table 2 Biobased chemicals from Saccharomyces cerevisiae 
Product Substrate Result Genetic modification(s) Reference 
2,3-BDO Glucose 2.29 g/L | ADHIAADH3AADHSA [113] 
titer 
2,3-BDO Glucose 72.9 g/L B. subtilis AlsS and AlsD, L. lactis [118] 
titer NoxE, and S. cerevisiae overexpression 
with 
ADHIAADH2AADH3AADH4AADHSA 
GPDIAGPD2A 
Hydrocodone | Glucose ~0.3 pg/L | Expression of 23 genes encoding for {114] 
titer various enzymes, overexpression of 
two native genes, and inactivation of 
one native gene 
Geraniol Glucose 5 mg/L ERG20 mutation and O. basilicum [122] 
titer monoterpene synthase expression 
Cineole Galactose 1 g/L titer | Overexpression of HMG2, ERG20, and | [125] 
IDI with expression of two genes 
encoding for terpene synthases from 
S. fruticosa and S. pomifera 
Bisabolene Glucose >900 mg/ | Overexpression of ERG/O, IDI, [126] 
and L titer ERG20, tHMGR, and Upc2-1 with 
galactose A. grandis BIS expression 
Bisabolene Glucose or | 5.2 g/L Deletion of YJLO62W and YPLO64W | [127] 
galactose titer 
Taxadiene Glucose 8.7 mg/L | Expression of codon-optimized [128] 
titer T. chinensis TDS, S. acidocaldarious 
GGPPS, mUpc2-1, and truncated 
HMG-CoA reductase isoenzyme | 
Miltiradiene | Glucose 488 mg/L | Expression of copalyl diphosphate [130] 
titer synthase, overexpression of a truncated 
HMG-CoA reductase and a mUpc2-1, 
and overexpression of a fusion gene of 
ERG20 and BTS/ together with 
S. acidocaldarius GGPS 
Artemisinic Glucose 2.5 g/L Multiple mevalonate pathway modifi- | [5] 
acid and titer cation, galactose as an inducer, and 
galactose Pmet3 promoter controlling ERGO 
Amorpha- Glucose 40 g/L Overexpression of every mevalonate [136] 
4,11-diene titer pathway enzyme through ERG20 and 
an optimized fermentation process 
Lactic acid Glucose 81.5% tm | Bovine LDH and PDCIAPDCSA [139] 
Lactic acid Glucose 69% tm R. oryzae IdhA with S. stipitis XYL1, [140] 
and xylose XYL2, and XYL3 balanced expression 
and PHOIZ3AALD6A 
Lactic acid Glucose, 66% tm R. oryzae IdhA with S. stipitis XYL1, [141] 
xylose, and XYL2, XYL3, and N. crassa cdt-1] and 
cellobiose ghI-1 balanced expression with 
PHOI3AALD6A 
Itaconic acid | Glucose 168 mg/L | A. terreus CAD with GPD promoter, [148] 
titer ADE3ABNA2ATESIA 


(continued) 


196 T.L. Turner et al. 


Table 2 (continued) 


Product Substrate Result Genetic modification(s) Reference 
Succinic acid | Glucose 12.97 g/L_ | Cytosolic retargeting of MDH3, [152] 
titer FRDS1, and E. coli FumC with PYC2 
overexpression and GPD/AFUMIA 
Succinic acid | Glucose 43-fold SDH3ASER3ASER33A and directed [154] 
increase evolution 
Glycolic acid | Xylose and | ~1 g/L A. thaliana GLYRI and MLSIAIDP24A__ | [156] 
ethanol titer with JCL/ and XR/XDK/XK xylose 
utilization pathway expression 
Xylitol Xylose and | ~100% S. stipitis XYL1, N. crassa cdt-1 and [166] 
cellobiose tm ghl-1 expression with ALD6, IDP2, 
and ZWF/ overexpression 
Xylitol Glucose ~100% Two XYL/ genes, ZWF 1, and ACSI [167] 
and xylose | tm expression with fed-batch optimization 


tm theoretical maximum 


researchers identified point mutation A81P in the transcription regulator Mth1 
involved in glucose sensing, which is necessary for glucose tolerance. They also 
introduced a bacterial 2,3-BD pathway by converting pyruvate into c-acetolactate 
and then acetoin, respectively, by acetolactate synthase (als) and acetolactate 
decarboxylase (alsD), and then acetoin is reduced into 2,3-BD by butanediol 
dehydrogenase (BDH1) from Bacillus subtilis. Finally, the engineered 
S. cerevisiae produced a titer up to 96.2 g/L under a fed-batch fermentation with 
a yield of 0.28 g/g glucose [116]. 

Recently, Kim et al. attempted to minimize the glycerol byproduct formation by 
decreasing the intracellular NADH/NAD* from the expression of NADH oxidase 
(noxE) from L. lactis, and the resulting engineered yeast strain was able to produce 
2,3-BD with a yield of 0.359 g/g glucose [117]. With a similar approach, Kim and 
Hahn tried to minimize glycerol production in engineered S. cerevisiae with the 
additional deletion of glycerol-3-phosphate dehydrogenase (GPD/ and GPD2), 
creating a strain which could produce a 2,3-BD titer of up to 72.9 g/L ina 
fed-batch fermentation and with a yield of up to 0.41 g/g glucose [118]. 


Isoprenoids 


Isoprenoids, also known as terpenes, are a diverse group of chemical compounds 
typically utilized as medicines, cosmetics, nutritional supplements, food additives, 
or even as a potential future biofuels [119]. S. cerevisiae harbor natural metabolic 
pathways to produce certain isoprenoids, although yields and productivities are 
very poor [120]. Despite the poor natural production, isoprenoids are of great 
interest because of their diverse structures and wide range of potential uses. Mono- 
terpenes (C10) and sesquiterpenes (C15) are two of the main candidates for jet fuel 
and biodiesel alternatives because of their low freezing temperature and high 
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ignition stability properties. To produce isoprenoids, acetyl-CoA production is of a 
high importance because all isoprenoids share the mevalonate metabolic pathway 
starting from acetyl-CoA [121-123]. Either the bacterial 1-deoxyl-p-xylulose 
5-phosphate (DXP) pathway or the eukaryote/archaea mevalonate (MVA) pathway 
is essential for the biosynthesis of isoprenoids. Both pathways end with the forma- 
tion of five-carbon monomers dimethylallyl pyrophosphate (DMAPP) and 
isopentenyl pyrophosphate (IPP). DMAPP and IPP are then condensed and modi- 
fied by prenyltransferases to form isoprenoid precursors such as geranyl pyrophos- 
phate (GPP, C10) and farnesyl pyrophosphate (FPP, C15) [124]. 

Monoterpenes (C10) are derived from GPP by monoterpene synthases. Fischer 
et al. is the first group able to produce geraniol, a monoterpene and alcohol, with a 
titer of up to 5 mg/L in S. cerevisiae by a mutation of ERG20 (farnesyl pyrophos- 
phate synthase) and the overexpression of heterologous geraniol synthase (mono- 
terpene synthases) from Ocimum basilicum [122]. To improve the monoterpene 
biosynthesis, Ignea et al. used the yeast sterol biosynthesis pathway genes HMG2, 
ERG20, and IDII and co-expression of two terpene synthase enzymes (cineole 
synthase) from Salvia fruticosa and Salvia pomifera. The final titer of cineole was 
up to | g/L [125]. 

Sesquiterpenes (C15) are another isoprenoid-derived potential fuel source which 
has recently gained interest for several industrial applications. Bisabolene, a pre- 
cursor of bisabolane, was produced at a titer of over 900 mg/L in engineered 
S. cerevisiae by Peralta-Yahya et al. [126]. The yeast was first engineered by 
overexpression of acetyl-CoA acetyltransferase (ERG/O), isoprenyl diphosphate 
isomerase (IDI), and farnesyl pyrophosphate synthase (ERG20), truncated 
HMG-CoA reductase ((HMGR), and the transcriptional regulator of the sterol 
pathway (Upc2-1). Then researchers examined six different bisabolene synthases 
isolated from Arabidopsis thaliana, Picea abies, Pseudotsuga menziesii, and Abies 
grandis. Finally they developed the highest titer with the codon-optimized 
bisabolene synthase (BIS) from A. grandis [126]. Recently, Ozaydin et al. screened 
the S. cerevisiae deletion collection for carotenoid production and constructed a 
strain producing the highest titer of up to 5.2 g/L of bisabolene through double 
deletion of YJLO64W and YPL062W [127]. 

Several diterpenes (C20) have also been produced by engineered yeast. In 2008, 
a titer of 8.7 mg/L of taxadiene was achieved from engineered S. cerevisiae 
[128]. This feat was achieved through two general metabolic modifications: 
(1) coexpression of a codon-optimized Taxus chinensis taxadiene synthase and a 
Sulfolobus acidocaldarius geranylgerany! diphosphate synthase and (2) expression 
of a truncated 3-hydroxyl-3-methylglutaryl-CoA reductase isoenzyme and a mutant 
regulatory protein UPC2-1 allowing for steroid uptake in anerobic conditions. In 
2012, miltiradiene, another diterpene, was overproduced through metabolic engi- 
neering of S. cerevisiae. Zhou et al. achieved a peak titer of miltiradiene of 365 mg/ 
L [129] and Dai et al. obtained 488 mg/L through a fed-batch fermentation 
[130]. The 488 mg/L titer was achieved through multiple S. cerevisiae metabolic 
engineering and fermentation technology steps: (1) overexpression of a mutated 
global regulatory factor (upc2./) and a truncated 3-hydroxyl-3-methylglutaryl-CoA 
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reductase (tHMGR), (2) copalyl diphosphate synthase was first expressed, 
(3) overexpression of a fusion gene of farnesyl diphosphate synthase (ERG20) 
and an endogenous geranylgeranyl diphosphate (BTS/) together with a 
geranylgeranyl diphosphate synthase from Sulfolobus acidocaldarius (SaGGPS), 
and (4) use of a fed-batch fermentation [130]. 

Artemisinin is a sesquiterpene lactone which has received notoriety as an 
antimalarial drug following its discovery by You-You Tu in the 1970s 
[131, 132]. Unfortunately, the natural isolation and industrial production process 
for artemisinin is not always reliable, and shortages of this vital drug have been 
reported [133]. Production of artemisinin through a reliable and sustainable micro- 
bial cell factory could be a viable alternative. Several labs have worked to construct 
such a process. An important precursor for artemisinin production, amorpha-4, 1 1- 
diene, was produced by Lindahl et al. in 2006 [134]. This result was achieved by 
subcloning the amorpha-4,11-diene synthase from Artemisia annua into a 
galactose-inducible, high-copy number pYeDP60 plasmid and subsequent transfor- 
mation of the plasmid into an S. cerevisiae strain. Although further optimizations 
are needed before industrial-scale applications, the final titer, 600 g/L, served as an 
important step toward microbial production of artemisinin. 

Within a year of the report of amorpha-4,11-diene the process of producing 
artemisinic acid from engineered yeast was published. Artemisinic acid serves as 
the immediate precursor of artemisinin and can undergo further chemical synthesis 
to produce artemisinin. In their report, Ro et al. achieved a peak titer of ~100 mg/L 
of artemisinic acid [135]. A multitude of engineering steps were necessary to 
achieve this production in an engineered S. cerevisiae, broadly including increasing 
farnesyl pyrophosphate (FPP) production and reducing its use for sterols, 
expressing the amorphadiene synthase gene from A. annua into the improved 
FPP-producing strain, and cloning a novel cytochrome P450 to provide a three- 
step oxidation pathway from amorphadiene to artemisinic acid. 

More recently, significant boosts in the production of both amorpha-4, 1 1-diene 
and artemisinic acid from engineered S. cerevisiae have been reported. Lenihan 
et al. produced a titer of 2.5 g/L of artemisinic acid from an engineered S. cerevisiae 
by using a defined medium containing galactose as a carbon source and inducer in a 
fed-batch process which utilized a precise agitation and feed pump rate [5]. A 
Pmet3 promoter was used to control ERG9, which improved precursor availability 
for artemisinic acid synthesis by limiting sterol synthesis. Later, Westfall et al. 
hypothesized that high titers of artemisinic acid may be unachievable without 
improvement to the production of necessary precursors [136], such as the previ- 
ously discussed amorpha-4,11-diene. Through overexpression of every mevalonate 
pathway enzyme through ERG20 and fermentation optimization resulted in a 
considerably titer of 40 g/L amorpha-4, 1 1-diene [136]. 
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Organic Acids 


Organic acids are widely used for many applications including usage as food 
additives. However, organic acids also serve as building blocks of many larger 
polymers by undergoing several steps of chemical catalysis. For example, lactic 
acid is produced by engineered S. cerevisiae by introducing lactate dehydrogenase 
(dh). Through catalysis, polylactic acid (also known as polylactide; PLA) can be 
produced [137]. PLA is a renewable and biodegradable polyester used for many 
purposes including as a filament for 3D printing, for producing medical screws/ 
implants, and for producing plastic dinnerware. Numerous studies have been 
conducted for producing lactic acid from engineered S. cerevisiae from a variety 
of feedstocks including glucose [138, 139], xylose [140], and cellobiose [141]. Cur- 
rently, no study using engineered yeast has been able to achieve the theoretical 
maximum of lactic acid production from glucose, xylose, cellobiose, or a mixture of 
these carbon sources, so work is ongoing to improve these fermentation processes. 
Of the studies which have generated lactic acid-producing S. cerevisiae, a variety of 
Idh sources have been used, including bovine materials [142, 143], Rhizopus oryzae 
[140, 141, 144], Bifidobacterium longum [142], and Lactobacillus plantarum 
[145, 146]. Moving forward, expression of /dh from yet-unstudied sources into 
S. cerevisiae may prove useful for producing specific ratios of L- or p-lactic acid, 
which can be beneficial for specific industrial applications. 

Because itaconic acid has many industrial uses, including serving as a copoly- 
mer for producing plastics and rubbers [147], this compound is another interesting 
organic acid which has recently been produced at a laboratory-scale in engineered 
S. cerevisiae. To achieve a peak titer of 168 mg/L of itaconic acid from 
S. cerevisiae, several metabolic engineering steps were implemented [148]. First, 
the cis-aconitic acid decarboxylase encoding gene (CAD) from Aspergillus terreus 
was expressed in an S. cerevisiae strain under the control of a strong “Enhanced” 
GPD promoter. Second, several gene targets including ADE3, BNA2, and TES/ 
were identified by a genome-wide stoichiometric model, deleted, and assessed for 
itaconic acid production improvements. Finally, the triple deletion strain expressing 
the A. terreus CAD was grown in optimized fermentation conditions including a 
high cell density to provide the peak titer of 168 mg/L itaconic acid. However, 
scale-up to a cost-effective and efficient industrial-scale process require further 
optimization, as a titer of more than 80 g/L of itaconic acid is considered 
necessary [148]. 

As with itaconic acid, muconic acid is another platform chemical which can act 
as a precursor for the production of many useful products, including various 
renewable plastics [149]. The first reported instance of muconic acid production 
by engineered S. cerevisiae was in 2012, resulting in a peak titer of approximately 
1.56 mg/L muconic acid [150]. However, by 2013, several metabolic engineering 
improvements allowed for production of 141 mg/L muconic acid from an 
engineered S. cerevisiae [151]. Several metabolic engineering steps were needed 
to produce this result. First, Candida albicans catechol 1,2-dioxygenase, 
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Enterobacter cloacae protocatechuic acid decarboxylase, and Podospora anserine 
dehydroshikimate dehydratase were expressed in an S. cerevisiae strain. Then 
ARO3 was deleted and a feedback-resistant mutant ARO4 was expressed to reduce 
shikimate pathway feedback inhibition. Next, ZWF'J was deleted and TKLI was 
overexpressed to increase precursor flux into the target pathway. Finally, several 
heterologous enzyme levels were balanced, resulting in the final titer of 141 mg/L 
muconic acid [151]. 

Succinic acid is a value-added organic acid which can be overproduced by 
engineered yeast [152—154]. Similar to lactic acid, succinic acid can be used as a 
precursor to several polyesters [155]. Furthermore, succinic acid is designated as 
GRAS by the U.S. Food and Drug Administration, which has allowed its use in the 
food industry as an acidity regulator. As an intermediate of the citric acid cycle 
(or tricarboxylic acid cycle), yeast natively produces succinic acid if provided with 
an aerobic environment, but overproduction of succinic acid requires multiple 
genetic perturbations. For example, Otero et al. constructed an engineered 
S. cerevisiae with deletions of SDH3, SER3, and SER33 to reduce primary 
succinate-consuming reactions and to interrupt glycolysis-derived serine 
[154]. The resulting engineered yeast displayed a 30-fold improvement in succinic 
acid titer and a 43-fold improvement in succinic acid yield as compared to the 
control strain. 

Beyond succinic acid, glycolic acid, a C2 hydroxy acid, has gained attention in 
recent years. The global glycolic acid production in 2011 was approximately 
40,000,000 kg with this expected to more than double by 2018 [156]. Glycolic 
acid is often used as a building block of a polyglycolate. The polyglycolate polymer 
is used as a packaging material because of its high gas permeability and mechanical 
strength. However, most glycolic acid is produced in a chemical process which 
relies on non-renewable fossil resources [156]. As an alternative, a biological route 
for the production of glycolic acid exists which involves converting glyoxylate 
through glyoxylate reductase into glycolic acid. To overproduce glycolic acid 
successfully, efficient glyoxylate reductase activity in an engineered S. cerevisiae 
is required. A further improvement, up to approximately 1 g/L glycolic acid, can be 
achieved by deletions of the malate synthase (MLS/) and the cytosolic form of 
isocitrate dehydronase (IDP2) genes [156]. As the current generation of organic 
acids produced by S. cerevisiae continues to improve and develop, it is likely that 
new, rare, or hard-to-obtain organic acids can be produced in laboratories by 
engineered S. cerevisiae strains. 


Rare Sugars, Sugar Alcohols, and Antioxidants 


Sugars such as L-ribose, p-allose, D-tagatose, and D-psicose are classified as rare 
sugars. As the name implies, these sugars are rarely found in nature, but they have 
beneficial health properties. L-Ribose, for example, is considered a very important 
intermediate to produce chemicals for pharmaceutical and food products 
[157, 158]. Although p-ribose is very common in nature, L-ribose is not found in 


Engineering and Evolution of Saccharomyces cerevisiae to Produce... 201 


nature based on current knowledge. The driving demand for L-ribose production is 
its potential as a building block for L-nucleoside-based pharmaceutical compounds. 
L-Nucleoside-based compounds or analogs play an important role in treating viral 
infections and cancers [159]. Currently, research regarding rare sugar production by 
engineered yeast is very limited. 

Sugar alcohols such as erythritol, xylitol, or sorbitol have a high demand in the 
food industry because of their sweetening properties without causing dental caries 
[160]. Although generally difficult, one positive aspect of sugar alcohol production 
is that, in general, sugar alcohols are not fermentable by S. cerevisiae, which limits 
reuptake by engineered yeast designed to overproduce target sugar alcohols. The 
interest in producing sugar alcohols dates back more than 50 years, with at least one 
study investigating D-arabitol production in Saccharomyces spp. [161]. More 
recently, a minute titer of 44 pg/mL mannitol was produced by expression of 
multiple copies of the E. coli mannitol-1-phosphate dehydrogenase gene (mtlD) 
into S. cerevisiae [162]. This titer was later improved upon by Costenoble et al. by 
producing a titer of nearly 400 mg/L of mannitol in an engineered S. cerevisiae in 
anaerobic conditions [163]. Primarily, this was achieved by expression of the 
E. coli mtlD into an S. cerevisiae strain and deletion of GPD1 and GPD2 followed 
by an oxygen-sparged fermentation which was switched to nitrogen-sparging 
during the exponential growth phase. 

As one primary example of a well-known sugar alcohol, xylitol shares similar 
sweetening power with sucrose, but it does not contribute to dental caries and has a 
cooling effect when eaten. A chemical hydrogenation process to produce xylitol has 
existed for decades [164] but, more recently, several groups have produced high 
xylitol titers and yields from biological, engineered yeast systems [165-167]. Oh 
et al. were able to produce xylitol rapidly and efficiently using an engineered 
S. cerevisiae expressing xylose reductase (XYL/), a cellodextrin transporter (cdt- 
1), and an intracellular B-glucosidase (gh/-/) via simultaneous utilization of xylose 
and cellobiose [166]. As a result, the engineered S. cerevisiae was able to produce 
xylitol at the maximum theoretical yield by co-utilization of xylose and cellobiose. 

Because antioxidants have been considered potentially beneficial as supple- 
ments to the human diet, there has been increased interest in efficiently producing 
these compounds from a consistently obtainable source rather than depending on 
extraction from seasonally-available produce. Resveratrol is one of these com- 
pounds of interest, as it is a common component of grape skins and wines made 
from these skins [168]. Many studies discussing the engineering of S. cerevisiae 
and other microbes for the microbe-based production of resveratrol have been 
published in recent years [169-172]. In one example, an engineered S. cerevisiae 
expressing a codon-optimized bacterial tyrosine ammonia lyase and an E. coli high- 
capacity, low-affinity arabinose transporter (araE) were able to produce a peak of 
3.44 mg/L at 48 h in a laboratory-scale grape juice fermentation [172]. This result is 
an important step from an industrial standpoint, as it represented a method to 
increase the resveratrol concentration in white wine, which in most cases has a 
significantly lower resveratrol concentration than red wine. 
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As with resveratrol production, glutathione is another antioxidant which has 
been extensively studied for production by engineered S. cerevisiae [173-178]. - 
Microbe-based production of glutathione is currently the primary industrial process 
for glutathione synthesis, although it can also be produced by chemical synthesis 
[179]. Because the microbe-based process is the major method of industrial-scale 
production, many varied processes to improve the titer, yield, and productivity have 
been explored. Recently, a titer of 320 mg/L of glutathione was achieved by a 
laboratory-evolved S. cerevisiae strain in an acrolein-containing medium 
[178]. Acrolein is an aldehyde which is toxic to yeast cells [180], although 
glutathione has been shown to act as a defense against acrolein toxicity, suggesting 
that cells which have increased resistance to acrolein may be overproducing 
glutathione [181]. Based on this knowledge, several S. cerevisiae strains were 
evolved over 250 generations on increasing concentrations of acrolein. Finally, 
S. cerevisiae strain A4-19 was isolated, which displayed glucose consumption rates, 
growth rates, and ethanol production rates similar to the parental A4 strain, yet had 
increased acrolein resistance and a glutathione titer of 320 mg/L, approximately 
twofold larger than the parental strain [178]. 


5 Current Scope and Future Outlook of Industrial Fuel 
and Chemical Production by Yeast 


As discussed in Sects. 3 and 4, many advances have been made in recent years in 
yeast metabolic engineering and synthetic biology for the purpose of biofuel and 
renewable chemical production. Collectively, these new technologies have resulted 
in S. cerevisiae strains capable of fermenting a variety of substrates, such as xylose 
and cellobiose, with improved target product yields and productivities. Only a 
fraction of these laboratory developments have seen implementation at an industrial 
scale because of prohibitive costs, difficulty in scale-up, or low yields and pro- 
ductivities. For industrial-scale biofuel production, S$. cerevisiae is the primary 
yeast species seeing usage, although lab-scale biofuel production by 
non-S. cerevisiae yeast, such as Yarrowia lipolytica and Schizosaccharomyces 
pombe, has seen growth in recent years [182, 183]. However, several 
non-S. cerevisiae microbes are used for industrial chemical production because of 
the wide range of target chemicals produced by the biobased chemical industry. 
Although S. cerevisiae is extremely hardy and can be easily engineered, there are 
instances where other microbes are preferred for a target product. Perhaps the most 
notable example is the use of engineered E. coli for the production of recombinant 
insulin [184], and over 150 recombinant therapeutics have been approved by the 
European Medicines Agency [185]. However, only approximately one-third of 
approved therapeutics utilize engineered EF. coli, with S. cerevisiae and other yeasts 
also accounting for a significant portion of industrial therapeutics, fuels, and 
chemicals [185]. 
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At the industrial scale, ethanol is the major biofuel target, especially by 
engineered S. cerevisiae [186, 187]. Ethanol is commonly used as a fuel additive 
for the creation of gasoline-ethanol blends. The use of ethanol blends in the United 
States has grown from less than 5 vol% to over 10 vol% in the past decade 
[188]. This growth is at least partially attributed to the United States Environmental 
Protection Agency’s Renewable Fuel Standard, which requires up to 17.4 billion 
gallons of renewable fuel production by 2016, of which 0.21 billion gallons must be 
cellulosic biofuel [189]. The total production requirement for renewable fuels can 
increase to 36 billion gallons by 2020 [190]. 

To achieve the renewable fuel standards set by the United States and other 
governments, industrial fuel producers have used S. cerevisiae as their platform 
microbial strain of choice. As of 2014, approximately 23.8 billion gallons of 
ethanol are produced on an annual basis worldwide, almost entirely from fermen- 
tation by S. cerevisiae [28]. The United States and Brazil are responsible for the vast 
majority of global bioethanol production, annually producing 14.3 billion gallons 
and 6.2 billion gallons, respectively [28]. In the United States, corn serves as the 
primary feedstock, whereas in Brazil, sugarcane is the major feedstock for the 
purpose of bioethanol production [191, 192]. 

As the two major bioethanol-producing countries, both nations have consider- 
able motivation for the success of their respective ethanol industries. In Brazil, 
ethanol serves as a transportation fuel at nearly a 1:1 ratio with gasoline [193]. In 
the United States, roughly 40% of corn produced is used for the purpose of 
producing ethanol [194]. Both nations provide protection to their bioethanol indus- 
tries in the form of tax breaks, subsidies, or increased tariffs toward imported 
ethanol. Moving forward, it is expected that these economic benefits are likely to 
shift away from first-generation biofuels (using corn and sugarcane juice as the 
feedstock) toward second-generation biofuels (using corn stover, switchgrass, and 
miscanthus). As government and environmental protection groups provide further 
incentives for renewable biofuel production by engineered yeast, scientific 
advances developed for producing fuel can be modified and applied to the produc- 
tion of non-fuel chemicals by engineered yeast. However, despite legislation in the 
United States and elsewhere to encourage biofuel production, no equivalent guide- 
lines exist to provide incentive specifically for the purpose of biobased, non-fuel 
chemical production. A global effort to limit average global Earth surface temper- 
atures to increasing by no more than 2°C relative to temperatures in the late 
nineteenth century by reducing greenhouse gas emissions has provided a minor 
incentive for renewable chemical production [195]. The influence this legislation 
has on biobased chemicals is small because of less than 10% of total fossil fuels 
being employed for chemical catalysis, with the vast majority going toward the 
energy and transportation fuel industries [196, 197]. 

Since early 2014, global oil prices have fallen rapidly and dramatically 
[198]. Unsurprisingly, as fossil fuel costs decrease, the economic production of 
biofuels and renewable chemicals becomes increasingly less viable. Not only are 
second-generation (lignocellulosic) biofuels at economic risk, but even the cur- 
rently more cost-effective first-generation biofuels become difficult to produce in a 
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cost-effective manner. Roboredo et al. suggest that “huge state subsidies” would be 
needed to maintain viable biofuel production amidst the crashing oil prices 
[199]. Although the short-term outlook on biofuel and renewable chemical produc- 
tion is uncertain, it is anticipated that the continuing volatility of oil prices is likely 
to encourage further research for efficient, economical, and renewable biofuel and 
renewable chemical production. 

Although there are many companies which produce renewable fuels or fuel 
additives, there also exist many companies worldwide which employ microbial 
fermentation for the production of non-fuel, renewable chemicals. In many cases, 
the exact specifications of the species of microbe used or the precise metabolic 
pathway engineering protocol are not entirely disclosed. However, some of the 
more notable companies using a yeast-based fermentation platform include DSM, 
Verdezyne, BioAmber, Amyris, and NatureWorks, which produce, respectively, 
succinic acid [200], adipic acid [201], 1,4-butanediol [202], farnesene [203], and 
lactic acid [204]. 


6 Conclusion 


Equipped with rapid advances in metabolic engineering, synthetic biology, and 
genomics, the production of fuels and non-fuel chemicals by engineered 
S. cerevisiae has developed tremendously. Several of these advances have 
transitioned to industrial-scale fermentation processes, allowing for the sustainable 
production of many valuable chemicals from renewable biomass. Despite these 
advances and growing numbers of industrial examples, many barriers still exist, 
which can hinder the further adoption of S. cerevisiae industrial fermentations. 

Currently, global oil prices have reached the lowest levels in approximately a 
decade [199]. Low oil prices are a major detriment not only to the cost-effective 
production of renewable fuels and chemicals but also to consumer and government 
sentiment regarding the short-term importance of developing a renewable chemical 
industry infrastructure. Furthermore, reduced oil prices significantly lower the cost 
of petroleum-based chemicals, which places additional pressure on renewable, 
fermentation-based biochemical production. Despite these pressures, many indus- 
trial biobased processes, such as succinic acid production (from E. coli) [205] and 
bioethanol production (from S. cerevisiae) [186, 187], are still considered to be 
feasible or even preferential to petrochemical production. 

Moving forward, newer and more complex industrial-scale fuels and chemicals 
can be produced by engineered S. cerevisiae as volatile oil prices and depletion of 
finite fossil fuels encourage investment in biobased alternatives. Nearly all 
industrial-scale S. cerevisiae fermentations start as laboratory-scale studies follow- 
ing the “Design, Build, Test, and Learn” cycle (Fig. 1), but simpler single-step 
metabolic pathways, such as producing lactic acid by a heterologous lactate dehy- 
drogenase [140], can give way to complex, multi-step pathways, such as producing 
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opioids [114]. Collectively, the impact of engineered S. cerevisiae on the biobased 
fuel and chemical industries is likely to expand in the near future. 
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Abstract Since its discovery 60 years ago, Corynebacterium glutamicum has 
evolved into a workhorse for industrial biotechnology. Traditionally well known 
for its remarkable capacity to produce amino acids, this Gram-positive soil bacte- 
rium, has become a flexible, efficient production platform for various bulk and fine 
chemicals, materials, and biofuels. The central turnstile of all these achievements is 
our excellent understanding of its metabolism and physiology. This knowledge 
base, together with innovative systems metabolic engineering concepts, which 
integrate systems and synthetic biology into strain engineering, has upgraded 
C. glutamicum into one of the most successful industrial microorganisms in the 
world. 
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1 Introduction 


The Gram-positive soil bacterium Corynebacterium glutamicum belongs to the 
veterans of industrial biotechnology. Its natural capability to produce and secrete 
glutamate in high amounts originally led to its discovery about 60 years ago 
[1]. From early on, its versatile metabolism, nutritional flexibility, and process 
robustness were major drivers to establish and develop industrial strains and 
processes from scratch. Among them, amino acid production has evolved most 
rapidly, today being a multi-billion dollar business [2, 3]. Over the past decades, 
global competition among leading companies in the field steadily demanded inno- 
vation to improve key performance indicators: yield, titer, and productivity. For this 
reason, C. glutamicum has become one of the best characterized microorganisms 
worldwide with regard to substrate spectrum and nutrient requirement [4], catabolic 
and anabolic pathways and their regulation [5], the underlying biochemistry [6], 
and response to environmental conditions [7]. On entering the era of genetic 
engineering, this provided a detailed knowledge base for targeted modification of 
enzymes and pathways to optimize established fermentation processes. More 
recently, powerful molecular tools for genome-based engineering together with 
technologies to analyze genome, transcriptome, proteome, metabolome, and 
fluxome have enabled the next level of strain engineering: tailored optimization 
on a systems wide level. Successful expression of heterologous genes in 
C. glutamicum even allowed crossing natural boundaries and paving the way to 
non-natural products. Systems and synthetic metabolic engineering has enabled 
C. glutamicum to produce a wide portfolio of products: biofuels, bulk and fine 
chemicals, polymer building blocks, polymers, feed additives, and products for 
nutrition and health care [4, 8, 9]. The central turnstile of all achievements is 
metabolism and physiology. Core carbon metabolism has to function properly in 
a successful cell factory. For maximal conversion of external carbon sources into 
desired products, metabolism has to keep producing cells alive and simultaneously 
provide energy, carbon building blocks, and redox power for biosynthesis. Without 
doubt, metabolism and physiology shape the basis for the industrial success of 
C. glutamicum and deserve a close and detailed view. Accordingly, this chapter 
summarizes our current knowledge in this area. In addition, prominent strategies 
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and showcases highlight the upgrade of C. glutamicum into one of the most 
important cell factories in white biotechnology. 


2 Metabolism: Pathway Principles and Engineering 
Strategies 


C. glutamicum is a soil-dwelling microorganism. It belongs to the high GC content 
Gram-positive bacteria, the Actinobacteria. Cells are small, non-motile, and 
non-spore forming. Their shape is typically club-like, explaining the name 
“coryne-form” (club-shaped). The type strain, C. glutamicum ATCC 13032, pos- 
sesses a circular chromosome of 3.3 Mb and a plasmid of 0.5 Mb [10]. From early 
on, the industrial relevance of C. glutamicum has driven the investigation of its 
biochemistry, with a strong focus on the pathways of core metabolism and their 
regulation, synthesizing products of interest from substrates of interest. This pro- 
vides a highly detailed portrayal of C. glutamicum, which substantially guides 
metabolic engineering approaches. 


2.1 Carbon Core Metabolism 
2.1.1 Substrate Uptake 


C. glutamicum is able to use a variety of carbon sources as growth and energy 
substrates, including sugars [11, 12], sugar alcohols [13], and organic acids [14-18]. 


Hexoses 


Sugar uptake in C. glutamicum is mediated by phosphotransferase systems (PTS), 
first described by Mori and Shiio [19]. During transport, the sugar is phosphorylated 
at the expense of phosphoenolpyruvate (PEP) (Fig. 1). For C. glutamicum, four PTS 
variants have been reported, being specific for glucose, fructose, sucrose, and 
mannose [20]. All systems consist of three distinct proteins: enzyme I (EI), histidine 
protein (HPr), and enzyme II (EI). El mediates substrate specificity and the 
corresponding protein variants are encoded by ptsG, ptsF, and ptsS for the uptake 
of glucose, fructose, and sucrose, respectively [21, 25]. The two general compo- 
nents EI and HPr are encoded by the genes pts! and ptsH. A remaining glucose 
assimilation activity, observed in ptsG null mutants, has shown the existence of an 
additional PTS-independent uptake system for glucose. Such an uptake would 
require intracellular phosphorylation of glucose. Indeed, a glucose kinase (g/k) is 
present in C. glutamicum [25, 26] and might contribute up to 15% of total glucose 
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Fig. 1 Overview on transporters and metabolic reactions in C. glutamicum for uptake and 
conversion of different industrially relevant sugars and sugar alcohols, accessible from renewable 
biomass [6, 13, 20-24]. The transporter set comprises phosphotransferase systems for uptake of 
glucose, fructose, sucrose, mannose, and B-glucosides, an ABC transporter for ribose, an H*- 
symporter for arabinose, permeases for gluconate and arabitol, a MFS-type transporter for 
mannitol, and a transporter of the major facilitator superfamily for non-PTS-mediated glucose 
uptake. Transporters for fructose secretion and xylose uptake have not been identified so far. 
ABCrip ATP-binding cassette transporter for ribose import, AraA arabinose isomerase, AraB 
ribulokinase, AraE arabinose H*-symporter, AraD ribulose 5-phosphate 4-epimerase, BglA 
phospho-f-glucosidases BglAl and BglA2, EI general PTS-component enzyme I, Elbe: 
B-glucosides-specific PTS component, ElIIg,, fructose-specific PTS component, EIIg;. glucose- 
specific PTS component, El[Man mannose-specific PTS component, ElIsy. sucrose-specific PTS 
component, Fbp fructose-1,6-bisphosphatase, Glk glucokinase, Gnd 6-phosphogluconate dehy- 
drogenase, GntK gluconate kinase, GntP gluconate permease, HPr histidine protein, lolT myo- 
inositol transporter 1 and 2, MtlID mannitol 2-dehydrogenase, MtlIT mannitol transporter, PEP 
phosphoenolpyruvate, PfkA 6-phosphofructokinase, PfkB fructose 1-phosphate kinase, Pgi 
phosphoglucoisomerase, Pmi phosphomannoseisomerase, RbsK ribokinase | and 2, RbtT ribitol 
transporter, Rpe ribulose 5-phosphate epimerase, Rpi ribose 5-phosphate epimerase, ScrB sucrose 
6-phosphate hydrolase, XylA xylose isomerase, XylB xylulokinase 


uptake [27]. Recently, two transporters for myo-inositol, (io/T] and iolT2) were 
identified, both of which mediate glucose uptake in C. glutamicum [28, 29]. Meta- 
bolic engineering of C. glutamicum toward utilization of a PTS-independent glu- 
cose uptake was beneficially applied to improve lysine [30] and succinate 
production [31]. Similarly, an additional uptake system was suggested for fructose 
produced by residual growth of a ptsF null mutant of C. glutamicum on this sugar 
[32, 33]. In this case, the mannose PTS has been identified to also transport fructose 
(Fig. 1). Metabolic flux analysis revealed that the mannose PTS is responsible for a 
relative fructose uptake flux of 8%, whereas 92% of fructose enters the cell via the 
fructose-specific PTS at the level of fructose 1,6-bisphosphate [34]. This finding led 
to the identification of fructose 1,6-bisphosphatase activity as bottleneck for fruc- 
tose- and sucrose-based lysine production [34, 35]. This was overcome by targeted 
overexpression of the encoding fbp gene, which resulted in a substantially improved 
production performance [36]. Though fructose can also be taken up by the two myo- 
inositol transporters [37], the lack of fructokinase activity avoids further 
metabolization [38]. 
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Pentoses 


Ribose is the only five carbon sugar that is naturally utilized by a variety of 
C. glutamicum strains. As in most bacteria, the uptake of ribose occurs through 
an ATP-binding cassette (ABC) transporter [39]. The genes, encoding the ribose- 
specific ABC transporter and its corresponding transcriptional regulatory protein 
(RbsR), are organized in an operon [40]. Subsequent to uptake, ribose is phosphor- 
ylated by one of the two ribokinases RbsK1 and RbsK2, which yields the pentose 
phosphate (PP) pathway intermediate ribose 5-phosphate (Fig. 1). Double-deletion 
of the two encoding genes rbsk1 and rbsk2 results in the inability to grow on ribose 
as sole carbon source [40]. Utilization of the pentose arabinose is a rare feature for 
C. glutamicum. The required enzymes L-arabinose isomerase (AraA), L- 
ribulokinase (AraB), and L-ribulose 5-phosphate 4-epimerase (AraD) are missing 
in most strains. As an exception, C. glutamicum ATCC 31831 possesses an araBDA 
operon and is able to grow on L-arabinose [41]. The upstream region of the gene 
cluster contains genes for a negative LaclI-type transcriptional regulator (araR) and 
a high-affinity arabinose-inducible H*-symporter (araE). Deletion of the latter 
strongly impairs growth at low arabinose concentration, whereas high substrate 
concentration supports normal growth, indicating the presence of a so far 
unidentified additional transporter. Metabolic engineering strategies for utilizing 
arabinose for growth and production rely on heterologous expression of the arab- 
inose gene cluster of Escherichia coli, making arabinose bioavailable for 
C. glutamicum type strain ATCC 13032 [42, 43]. Natural xylose users have not 
been described so far. However, C. glutamicum can take up xylose from the 
environment and harbors a functional xy/B gene, encoding xylulokinase activity 
[44]. Type strains, however, lack xylose isomerase activity, required as the essential 
link to channel xylose into central carbon metabolism [44]. Related to the relevance 
of xylose as renewable feedstock, the xylose-assimilation pathway has been 
reconstructed in C. glutamicum to allow growth [44] and production of organic 
acids [44], proteinogenic and non-proteinogenic amino acids [45, 46], and diamines 
[46-48]. 


Gluconate and B-Glucosides 


Gluconate enters the cell via a specific permease (GntP) and is subsequently 
phosphorylated into 6-phosphogluconate, an intermediate of the oxidative PP 
pathway [49]. Two GntR-type regulators, GntR1 and GntR2, control gluconate 
metabolism and PTS-mediated glucose uptake. In the absence of gluconate, genes 
involved in gluconate metabolism (e.g., gntP and gntK) are repressed, whereby 
transcription of ptsG and ptsS, responsible for PTS-dependent glucose and sucrose 
uptake, is enhanced [49]. The metabolism of B-glucosides is, similarly to arabinose, 
a strain-specific feature. The strain C. glutamicum R, for instance, possesses two 
gene clusters (bg/F-bglA-bglG and bifF2-bglA2-bg/G2) for uptake and degradation 
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of B-glucosides such as salicin, arbutin, and methyl-B-glucoside, whereas such gene 
clusters are not present in the genome of C. glutamicum ATCC 13032 [6, 50]. 


Sugar-Alcohols 


Arabitol can be used by C. glutamicum as sole carbon and energy source [13]. In the 
presence of arabitol, the catabolic operon, comprising the genes xy/B, rbtT, mtlD, 
and sixA, is induced via a regulator, that is, AtIR [13]. Arabitol is taken up via the 
permease RbtT, and is then oxidized into xylulose by NAD-dependent arabitol 
dehydrogenase, encoded by mtiD (Fig. 1). Subsequent phosphorylation into 
xylulose 5-phosphate represents an overlap to xylose metabolism and relies on 
xy/B-encoded xylulokinase. In addition, C. glutamicum carries a mannitol catabolic 
operon, but the presence of the auto-regulator protein AtIT (MtIR) prevents man- 
nitol utilization. Deletion of the mt/R gene abolishes repression and enables tran- 
scription, likely producing polycistronic mRNA of the two structural genes m#/lT 
and mtlD, encoding an MFS-type transporter and NAD-dependent mannitol 
2-dehydrogenase (Fig. 1), respectively [22]. As C. glutamicum lacks fructokinase 
activity, fructose, the product of mannitol oxidation, cannot be phosphorylated 
within the cell. Further metabolization involves fructose efflux by a so far 
unassigned transporter and re-uptake by the fructose-specific PTS [25, 32, 33]. 


2.1.2 Embden—Meyerhof—Parnas Pathway 


The Embden—Meyerhof—Parnas (EMP) pathway is a major route for catabolic 
breakdown of sugars and sugar alcohols. Pathway control mainly occurs by meta- 
bolic regulation of glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and 
pyruvate kinase (PYK), which are sensitive to the redox and energy state of the 
cell [10]. The redox state is hereby sensed as NADH/NAD ratio [32], whereas the 
energy level is sensed as absolute concentration of ATP and AMP, respectively 
[51]. Interestingly, the enzyme 6-phosphofructokinase does not show a classical 
regulation pattern, which would be activation by low energy metabolites and 
inhibition by high energy metabolites, but it is inhibited by ADP instead [10]. Glu- 
coneogenesis, the antagonist pathway, is transcriptionally induced by selected 
carbon sources such as pyruvate, lactate, glutamate, and acetate [17, 18, 52]. Addi- 
tional control is taken at the level of fructose 1,6-bisphosphatase (FBPase), which is 
strongly sensitive to metabolic regulation by AMP, PEP, and its own substrate 
[10]. During growth on glucose, C. glutamicum ATCC 13032 channels roughly 
50% of carbon through the EMP pathway [53]. The relative contribution of the 
pathway to glucose degradation changes in response to cellular requirements. 
Systematic investigation of different lysine-producing strains reveals a reduced 
flux through the EMP pathway with increasing production performance [36, 53— 
58]. Interestingly, when the carbon source is fructose, the EMP pathway becomes 
the major catabolic route and carries more than 90% of the total flux [34]. This 
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relates to the entry point of fructose at the level of FI6BP combined with a lack of 
in vivo FBPase activity, which forces the fructose carbon downstream into the EMP 
pathway [34]. Utilization of the glucose-fructose disaccharide sucrose results in an 
intermediate flux pattern of the two hexoses [35]. As mentioned above, this 
observation led to the identification of FBPase as a bottleneck for lysine production 
[34, 35] and corresponding overexpression of the encoding gene toward improve- 
ment [36]. Other EMP pathway-related engineering strategies toward improved 
lysine production comprise deletion of pgi, encoding phosphoglucoisomerase [59], 
deletion of pyk, encoding pyruvate kinase [60], and co-factor engineering of 
glyceraldehyde 3-phosphate dehydrogenase [61-63] for improved supply of 
NADPH. For products such as lactate and alanine, high glycolytic fluxes are 
favorable. Overexpression of glycolytic enzymes was hereby successfully applied 
for improving production [64—66]. Ornithine and arginine production also profited 
from overexpression of the glycolytic gene pgk, encoding phosphoglycerate 
kinase [67]. 


2.1.3 Pentose Phosphate Pathway 


The pentose phosphate (PP) pathway represents an alternative glycolytic route in 
C. glutamicum. The oxidative part is comprised of glucose 6-phosphate (G6P) 
dehydrogenase (zwf-opcA genes), 6-phosphogluconolactonase (devB gene), and 
6-phosphogluconate (6PG) dehydrogenase (gnd gene). It is most relevant for the 
supply of redox power [51, 68]. The regenerative or non-oxidative route provides 
building blocks and also recycles excess carbon back into the EMP pathway. It 
involves transketolase (tkt gene) and transaldolase (tal gene) activity [51]. Tran- 
scriptional regulation has been little studied, although GntR-like regulators have 
been discovered as repressors of the PP pathway genes tkt, tal, zwf, opcA, and devB, 
respectively [69]. Quantification of metabolite pools and elucidation of kinetic 
properties identified the enzymes G6P and 6PG dehydrogenase as major control 
points for carbon flux, mainly through sensing of the NADPH/NADP ratio 
[68]. The PP pathway is crucial for amino acid overproduction. As an example, 
increased lysine production requires an increased flux into the pathway 
[58, 70]. This observation stimulated the design of strains with increased PP 
pathway flux to improve lysine production. Successful examples demonstrate 
overexpression and modification of zwf, encoding G6P dehydrogenase [54], imple- 
mentation of a point mutation into the gnd gene [71], deletion of pgi, encoding 
phosphoglucoisomerase [59], start of codon engineering of zwf and pgi [72], and 
overexpression of the full tkt-operon [55]. The findings from flux analysis during 
growth on fructose [34] and sucrose [35] further unraveled FBPase as an additional 
target to enhance PP pathway flux, the overexpression of which is beneficial for 
lysine production [36]. Similar engineering strategies were applied to improve 
other NADPH-demanding production processes including those of diaminopentane 
[47, 73], L-isoleucine [74], L-valine [75], L-arginine [76], and L-ornithine [77], 
underlining the high importance of the PP pathway for biotechnological production 
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in C. glutamicum. Beyond its essential role for NADPH supply, the PP pathway 
provides carbon building blocks for, for example, biosynthesis of aromatic com- 
pounds. The reactions of transketolase and transaldolase, comprising the 
non-oxidative PP pathway, were of high importance. Overexpression of the tkt 
gene was successfully applied for targeted improvement of the production of L- 
phenylalanine [78] and L-tryptophan [79]. 


2.1.4 Tricarboxylic Acid Cycle and Glyoxylate Shunt 


The tricarboxylic acid (TCA) cycle is a key metabolic pathway of aerobic micro- 
organisms such as C. glutamicum. It supplies biosynthetic precursors and energy: 
ATP (or GTP), and NADH and FADH for subsequent ATP generation via the 
respiratory chain and ATP synthase. Several nodes are under sophisticated meta- 
bolic and transcriptional control to modulate the carbon flux through the TCA 
cycle. The flux partitioning between the TCA cycle and the glyoxylate (Glx) 
shunt is adjusted at the level of isocitrate through metabolic control of the TCA 
cycle enzyme isocitrate dehydrogenase [10, 80], whereas isocitrate lyase, the entry 
enzyme into the Glx shunt, is controlled on the transcriptional level [17, 81] and 
inhibited by several metabolites including 3-phosphoglycerate, 6-phospogluconate, 
PEP, F1I6BP, succinate, and glyoxylate [82]. Further control of the TCA cycle 
occurs at the level of the 2-oxoglutarate dehydrogenase complex (ODHC) 
[83]. ODHC is activated and inhibited by several effector molecules [84] and also 
regulated by the ODHC repressor protein OdhI [85]. In the improvement of 
C. glutamicum for L-glutamate production, the alteration of control of ODHC has 
proven valuable [85-89]. In line with this, production of L-glutamate-derived 
y-amino butyrate could be improved by deletion of odhA [90]. For other added- 
value products, such as L-lysine and its daughter product diaminopentane, the TCA 
cycle displays a competing pathway. Here, approaches for improving production 
efficiency intentionally reduced the flux through the TCA cycle, whereby citrate 
synthase [91] and isocitrate dehydrogenase [47, 92] were selected as engineering 
targets. An innovative strategy coupled lysine formation to the TCA cycle flux 
through the elimination of succinyl-CoA synthase [93]. Engineering of itaconic 
acid overproduction considered targeted downregulation of isocitrate dehydroge- 
nase [94] by using rare translational start codons [72, 92]. Combined with the 
deletion of malate synthase, this strategy was similarly applied for glycolate 
production [95]. 


2.1.5 Pyruvate Metabolism 


Pyruvate and PEP represent a central switch-point in metabolism. They function as 
highly connected hubs between the EMP pathway and the TCA cycle, take part in 
PTS-dependent substrate uptake, are the starting point for overflow metabolism, 
and serve as building blocks for anabolism. C. glutamicum possesses a rich 
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enzymatic set around the PEP-pyruvate node: pyruvate carboxylase (PCx), PEP 
carboxylase (PEPCx), pyruvate kinase (PK), pyruvate dehydrogenase (PDHC), 
pyruvate:quinone oxidoreductase (PQO), PEP carboxykinase (PEPCk), malic 
enzyme (MalB), and a putative oxaloacetate decarboxylase (Odx) [96, 97]. Related 
to the concerted action of multiple carboxylation and decarboxylation reactions 
in vivo, the metabolism of C. glutamicum is highly flexible, important in order to 
respond rapidly to altering conditions [97, 98]. Metabolic regulation of the different 
enzymes seems significant. PCx is the major anaplerotic enzyme, contributing to 
90% of total flux in vivo, although in vitro activity of PEPCx is substantially higher 
[98-100]. The relevance of this metabolic switch-point entailed substantial engi- 
neering strategies toward production of different industrial goods (Table 1). 


2.2. Anabolism 


Cells of C. glutamicum are mainly composed of five macromolecules, namely 
protein, DNA, RNA, lipids, and cell wall carbohydrates (Fig. 2). As do almost all 
Corynebacterium species, C. glutamicum exhibits a complex cell wall architecture: 
a peptidoglycan layer covers the plasma membrane, which itself is bound to 
arabinogalactan, a complex hetero-polysaccharide meshwork [122]. The plasma 
membrane mainly contains oleic acid (18:1) and palmitic acid (16:1) [123]. The cell 
wall is rather unique, as it contains diaminopimelic acid and an outer membrane 
with mycolic acids [124]. The anabolic pathways in C. glutamicum are well- 
established. Biomass building blocks, such as amino acids, nucleotides, fatty 
acids, and carbohydrates, are synthesized from glucose 6-phosphate, fructose 
6-phosphate, ribose 5-phosphate, erythrose 4-phosphate, glyceraldehyde 
3-phosphate, 3-phosphoglycerate, pyruvate, phosphoenolpyruvate, acetyl-CoA, 
2-oxoglutarate, succinyl-CoA, and oxaloacetate [58], and they are then assembled 
to the corresponding macromolecules and also occur as free intracellular pools 
[58, 93, 121]. Beside carbon precursors, anabolism also demands reducing equiv- 
alents and energy. Because of the huge interest in C. glutamicum, its cellular 
composition has been precisely determined and the specific demand for certain 
precursors as well as redox power and energy is well known [58] and can be used, 
for example, to infer metabolic fluxes [125, 126]. The synthesis of 1 g cell dry mass 
requires about 16,400 pmol of NADPH [58]. Interestingly, the ATP demand of 
6,779 pmol g ' is mainly because of polymerization of cell protein, but not 
precursor biosynthesis. 


2.3. Regulation and Control of Metabolism 


In the past few years, transcriptional regulation of C. glutamicum has been studied 
extensively, largely driven by powerful whole-genome transcriptome profiling 
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Table 1 Metabolic engineering of pyruvate metabolism in C. glutamicum toward improved 
production of industrially relevant goods 


Enzyme Modification Product Effect References 
Pyruvate dehydrogenase AaceE L-Valine + {101] 
aceE*!¢ L-Lysine + [102] 
aceE A16* Isobutanol - [103] 
L-Lysine + [72] 
L-Valine + [104] 
L-Lysine + [104] 
2-Oxoisovalerate + [104] 
Pyruvate kinase Apyk L-Glutamate + [105] 
L-Lysine ote [106] 
L-Lysine - [60] 
Pyruvate carboxylase pEKE3x-pyc Putrescine + [107] 
pych 48s : L-Glutamate + [100] 
PyogpycP 85S L-Threonine + [100] 
Apyc L-Lysine + [100] 
L-Lysine + [108] 
L-Lysine + [55] 
L-Glutamate + [109] 
Succinate a [110] 
Lactate + [110] 
Isobutanol + [103] 
PEP carboxylase Appc Ethanol + [111] 
PPCmut” Succinate - [110] 
pAJ43-ppc L-Glutamate = [109] 
pECt-ppc Isobutanol ct [103] 
ppcN?'76 4 L-Tryptophane + [112] 
ppcr?2°N q L-Threonine + [113] 
L-Proline + [113] 
L-Glutamate + [109] 
L-Lysine + [114] 
L-Glutamate + [115] 
PEP carboxykinase Apck L-Lysine + (116, 117] 
pEK-pck L-Lysine — [116] 
Malic enzyme pVWEx1-malE L-Lysine oe {118] 
AmalE Isobutanol = [119] 


The individual effects of genetic changes on production are given as follows: improved (+), 
decreased (—), not changed (++) 
“Replacement of the native promoter by dapA promoter variant A16 [120] 
Pyruvate carboxylase variant with reduced sensitivity for inhibition 
“Mutated PEP carboxylase isolate with 75% reduced activity 

“PEP carboxylase variant with reduced sensitivity for inhibition 
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Fig. 2 Cellular composition of C. glutamicum. The anabolic demand for synthesis of the 
macromolecules was taken from previous work [58, 121] 


technologies such as DNA microarrays and RNA sequencing. This has unraveled a 
highly complex transcriptional regulatory network (TRN), consisting of about 
160 interacting genes for DNA-binding transcription regulators, various sigma (0) 
factors, and additional regulator proteins [127]. Today, the TRN of C. glutamicum 
is available from interactive databases, such as the web-based platform 
“CoryneRegNet,” which are continuously updated with novel findings 
[128]. Knowledge of pathway and expression regulation has substantially supported 
metabolic engineering. Here we give an overview of the general features 
complemented with applications in the field of metabolic engineering. 
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Fig. 3. Molecular structure of the transcriptional initiation site in C. glutamicum, mediating RNA 
polymerase binding and start of transcription with a, B, B’, and w indicating the RNA polymerase 
subunits. Consensus sequences at —35 and —10 upstream of the transcription start point are given 
for the sigma factors oO, oF, o™, and of, Upper case letters indicate sequence conservation of over 
80%, lower case letters of over 40% [129] 


2.3.1 Sigma Factors 


Promoter sequences upstream of structural genes display the core piece for the 
regulation of gene expression in C. glutamicum. In short, the promoter region 
usually consists of 40-50 base pairs and binds the transcription machinery, the 
RNA polymerase complex. Hereby, a regulator protein, called sigma (o) factor, 
interacts with the partially melted double-stranded DNA and affects the binding 
capacity of the complex and transcription initiation. In addition, promoter consen- 
sus sequences, 35 and 10 nucleotides upstream, of the transcriptional start point at 
position +1 (Fig. 3) and the spacing sequence between the consensus sequences, 
influence the initiation of transcription [130, 131]. C. glutamicum possesses differ- 
ent sigma factors, which all belong to the o”° family. Cells use varied expression of 
the individual sigma factors to regulate their gene expression, amongst other 
mechanisms. Overall, seven sigma factors — o*, oF ; oS, oP? “ of, of and o™ — 
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have been discovered in C. glutamicum [130, 132, 133]. The expression of house- 
keeping genes is mainly controlled by o”, whereas stress related expression is 
primarily under the control of 6”, o'', and o™ [134-136]. Mutants lacking o? are 
more sensitive to heat, cold, salt, acid, and alcohol stress [137]. If oxygen supply is 
limited, 6” positively promotes genes of glucose uptake and several genes of the 
EMP pathway and the TCA cycle [138]. 


2.3.2 Transcriptional Regulators 


The basic level of the TRN consists of local regulators, each of which control a 
small subset of only a few genes, related to a rather specific function. An example is 
fructose assimilation, regulated by the local regulator FruR that controls the 
expression of fructose specific genes [5, 139, 140]. Such functionally related 
genes are usually located in an operon- or divergon-like structure, the latter being 
a pair of divergently transcribed operons. Master regulators with superimposed 
function take a higher level of control within the TRN. They orchestrate complex 
cellular programs, related to carbon metabolism (e.g., RamAB, SugR, SigB), 
nitrogen metabolism (AmtR), phosphor metabolism (PhoR), sulfur (e.g., McbR, 
CysR) and iron (DtxR) homeostasis, respiration/anaerobiosis (ArnR), and stress 
responses (e.g., LexA, SigH) for cell survival [141, 142]. As an example, AmtR 
inhibits transcription of amtA (amino-methy] transferase), amtB (ammonium trans- 
porter), g/nA (glutamine synthetase), g/tBD (glutamate synthase), and dapD 
(tetrahydrodipicolinate succinylase). It also controls linked pathways of creatinine 
and urea metabolism: codA (creatinine deaminase), crnT (creatinine transporter), 
urtABCDE (ABC-type urea transporter), and ureABCEFGD (urease). Hereby, 
AmtR activity itself senses the ammonium level, which is mediated by a signal 
cascade of UTase (uridylyltransferase) and the regulatory protein GlnK 
[143, 144]. Deletion of AmtR results in deregulation of the ammonium uptake 
system in C. glutamicum [145]. Detailed knowledge of nitrogen metabolism 
appears especially valuable for amino acid and diamine production processes. 
The ammonium level thereby not only defines the assimilation route and thus the 
“energetic cost” of uptake [146] but might also influence pathway usage as dem- 
onstrated for lysine production [146, 147]. These findings guided metabolic design 
and engineering of C. glutamicum for the production of L-lysine [55], 
diaminopentane [73], and ectoine [148]. 

The McbR regulator has been at the focus of researchers as it takes substantial 
control in the biosynthesis of the feed amino acid methionine [149]. Its deletion 
entailed oxidative stress [150] and imbalances in the metabolism of sulfur- 
containing amino acids, resulting in the accumulation of pathway intermediates 
and the activation of normally silent pathways [151-153]. 

The major response regulator in C. glutamicum for heat and oxidative stress is 
SigH (o") [7], a sigma factor responsible for the transcription of sigA, sigM, and 
sigB [7, 154]. When cells are exposed to heat stress, o! activates transcription of 
clpC (Clp ATPase subunit), c/pP/P2 (Clp protease subunits), and c/gR (ClgR, 
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positive regulator of c/pP/P2). In parallel, o' controls ClgR via modulation of 
stability and transcription [155, 156]. During heat shock response, o"' additionally 
controls the expression of other regulators: ClgR, SufR, WhcA, and WhcE 
[157]. Depending on the imposed temperature, the regulatory system shows an 
intensity-dependent response for HrcA/CIRCE regulated genes, but not for genes 
regulated by HspR/HAIR [158]. The heat stress response cascade results in the 
activation of molecular chaperones which stabilize the cellular proteins. During 
exposition to oxidative stress, 6" activates the expression of whcE and whcA with 
whcE being a repressor of whcA and whcB under normal growth conditions 
[159]. As phenotypic results, increased expression of whcB and downregulation 
or deletion of whcE improves growth. Besides heat shock proteins, molecular 
chaperones and ATP-dependent proteases are upregulated [7]. Upon stress, 
C. glutamicum also changes the expression of genes of core carbon metabolism 
[160]. At increased temperature, citrate synthase gene g/tA is expressed less, 
whereas the expression of malE, encoding malic enzyme, is increased 
[161]. These natural metabolic responses can be harnessed for bio-based produc- 
tion. Examples include improved lysine [161] and ectoine [148] production at 
higher temperature. Under hyperosmotic stress, most substrate is used for ATP 
generation and is directed toward glycolysis and TCA to satisfy higher demand for 
maintenance [162]. 

The different modules of the TRN in C. glutamicum are not fully autonomous, 
but are interconnected via regulators that function as a kind of interface 
[142, 163]. Negative autoregulation of the master regulators enables a fine-tuned 
gene expression together with a rapid response to imposed environmental changes 
[164, 165]. Regarding global regulators at the very top of the TRN cascade, the only 
protein identified so far in C. glutamicum is G1xR (Fig. 4). It mediates the cellular 
response to altering levels of the signal molecule cAMP [142, 167, 168]. The 
numerous GlxR-specific DNA-binding sites in the genome of C. glutamicum sug- 
gest global regulation of up to 14% of C. glutamicum genes and transcription 
regulators by this global regulator. 


2.3.3. Small RNAs 


Small RNAs are short, non-coding RNA molecules which control the stability and 
the translation efficiency of mRNA. These regulatory elements dynamically change 
gene expression. Small RNAs seem to be the most abundant posttranscriptional 
regulators in bacterial cells. Transcriptome sequencing revealed a variety of these 
regulatory elements to be present in C. glutamicum [169]. For higher organisms, 
such as fission yeast and the multicellular model Caenorhabditis elegans, the role 
of small RNAs has been extensively investigated [170]. For C. glutamicum, how- 
ever, we know less at present, but first insights into transcriptional regulation via 
small regulatory RNAs reveal that they may also play an important role in this 
bacterium [171]. Analysis of RNA-seq data from the sRNA cDNA library of 
C. glutamicum verified short transcripts in the known transcriptional attenuators 
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Fig.4 Transcriptional regulatory network of C. glutamicum, involving the o factors 6“, 0, and o”! 
and the master regulators AmtR, WchB, and HspR, the latter being involved in regulation of 
ammonium assimilation, oxidative stress, and heat shock response, respectively. Repression is 
indicated by 7-bars, activation is given by arrows. AmtR regulatory protein for ammonium 
assimilation, amtA amino-methyltransferase, amtB ammonium transporter, g/nA glutamine 
synthetase, g/tBD glutamate synthase, dapD tetrahydrodipicolinate succinylase, codA creatinine 
deaminase, crnT creatinine transporter, urtABCDE ABC-type urea transporter, ureABCEFGD 
urease, gdh glutamine synthetase expressing gene, clpCClp ATPase subunit, clpP/P2 Clp protease 
subunits, c/gR positive regulator of clpP1P2, sigA, sigM, sigH respective o-factors, whcB WhcB 
regulator of growth phase transition, whcA WhcA regulator with SpiA regulation of stress response 
genes, whcE WhcE regulator of growth and activation of stress response, hspR HspR repressor of 
clgR transcription, ClgR regulation of clpP/P2 and heat stress response, sufR SufR regulator of 
heat stress response [7, 133, 143-145, 155, 157-159, 166] 
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sites of the trp operon, the i/vBNC operon, and the /euvA gene [172]. Further 
elucidation promises advances for the production of related products such as 
aromatic and branched-chain amino acids and biofuels. 


3 Molecular Tools for Genetic Engineering 


When approaching systems and synthetic metabolic engineering, the availability of 
molecular tools of DNA manipulation is essential. First steps toward genetic 
engineering of C. glutamicum were initiated in the 1980s by the discovery and 
isolation of natural plasmids [173, 174] and the invention of DNA transfer methods 
[175]. Meanwhile, episomal and genome-based DNA manipulation are routine 
techniques because of the availability of the genome sequence [176, 177], the 
development and advancement of episomal [129, 178] and integrative plasmids 
[179, 180], optimized transformation methods [181, 182], and cloning and expres- 
sion procedures [183, 184]. 


3.1 Plasmids 


For genetic manipulation of C. glutamicum, different types of plasmids have been 
developed. Autonomously replicating plasmids are mainly based on the naturally 
occurring cryptic set of C. glutamicum plasmids [129]. For amplification, mainte- 
nance, and propagation, plasmids are designed as C. glutamicum/E. coli shuttle 
vectors and are equipped with selection markers conferring antibiotic resistance 
[129, 184, 185]. This enables application as cloning, promoter probe, and expres- 
sion vectors [10]. Modification of the chromosome of C. glutamicum became 
possible by the application of the DNA vectors lacking replicon elements. Chro- 
mosomal integration is commonly permitted via homologous recombination 
[186, 187] through site-specific insertion sequences of IS-elements and phage 
sequences [185, 186, 188-190]. Discovery and application of the Bacillus subtilis 
levansucrase (sacB) as counter-selectable marker was a major breakthrough for 
genome-based manipulation of C. glutamicum [179]. This conditionally lethal 
marker system displays the most convenient system genetic engineering of 
C. glutamicum [36, 101, 148, 187, 191, 192]. The survival rate of plasmids, 
subsequent to transformation, remains a critical factor as it directly correlates to 
the success rate of genetic manipulation. In this regard, circumvention of the natural 
defense system of C. glutamicum to degrade foreign DNA enzymatically substan- 
tially improved genetic engineering [10]. Successful strategies include exposure to 
heat, solvent, or pH stress [193-195], plasmid transfer through C. glutamicum 
compatible hosts [196], recruitment of intermediate cloning host for adding the 
C. glutamicum specific DNA-methylation pattern [197], or the use of synthetic 
[198, 199] and non-methylated DNA [182, 200]. 
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The o-factor binding region influences the strength of a promoter, because the 
nucleotide sequence defines the binding efficiency of the RNA polymerase and thus 
the efficiency of transcription initiation. Consequently, promoter modification bears 
plenty of optimization possibilities for metabolic engineering of C. glutamicum. In 
this regard, the constitutive and strong promoter sequence of superoxide dismutase 
(Sod), elongation factor TU (Eftu), and the chaperone GroEL have proved to be 
valuable for targeted increase of gene expression in C. glutamicum [14, 36, 48, 55, 
76, 77, 197]. Beyond the natural set of promoters, synthetic promoter libraries have 
been developed through site-directed mutagenesis and randomization of promoter 
length [120, 201, 202]. The variety of weak and strong promoters obtained confers 
higher flexibility for gradually decreasing or increasing gene expression and thus 
fine-tuning of enzyme and pathway activities. Successful applications include the 
production of L-valine [201] and L-lysine [91] and high-level expression of endo- 
xylanase [203]. The set of synthetic promoters also includes inducible promoters 
relying on IPTG [202, 203] or on carbon sources such as gluconate and maltose, 
allowing substrate-dependent pathway modification [204]. With the advent of 
synthetic metabolic engineering, the recruitment of genes from heterologous 
donor strains became more and more convenient. This was often hampered by the 
different genetic peculiarity of donor and host regarding GC content and codon 
usage. Here, substantial benefit was achieved by using synthetic genes which were 
codon-optimized for C. glutamicum as being done for the E. coli-derived lysine 
decarboxylase gene /dcC to improve diaminopentane production [197]. 


3.3 Small RNAs 


Small RNAs dynamically control gene expression, which is attractive for strain 
engineering. Trans-encoded sRNA (Fig. 5c—e), transcribed from regions separate 
from their target genes, have the ability to inhibit and to promote translation of the 
target mRNA. Pairing with the 5’UTR and the ribosome-binding site blocks trans- 
lation. The formation of sRNA—mRNA complexes leads to degradation by RNAses. 
Activation of translation is triggered if trans-encoded sRNAs prevent the formation 
of inhibitory structures around the ribosome-binding site [205-208]. A cis-anti- 
sense SRNA (Fig. 5a, b), transcribed from the opposite strand of the target DNA 
shows high complementarity and acts rather specific in three different ways. By 
binding to the ribosome-binding site of the target mRNA, translation is inhibited 
and RNA degradation is activated. A cis-antisense sRNA that binds in the 
intergenic region between two genes of an operon can cause the cleavage into 
two mRNAs. Such sRNAs function as transcriptional terminators as well 
(205, 209-211]. The length of sRNA molecules varies between 50 and 300 nt 
[169, 212]. Most small RNAs discovered in C. glutamicum are structured antisense, 
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Fig. 5 Transcriptional and translational regulation via cis- and trans-encoded antisense RNA. 
Cis-encoded antisense RNA attenuates translation and induces mRNA degradation. Trans- 
encoded antisense RNA are transcribed from a gene sequence distant from its target MRNA. (a) 
Cis-encoded antisense RNA (orange) is transcribed between two genes (green) and is highly 
complementary to the target mRNA. The sRNA (red) is able to function as transcriptional 
terminator or cleaves the mRNA into two mRNA fragments in order to alter translation. (b) 
Pairing with bases near the ribosome binding site. (c) Imperfect base pairing, translation of MRNA 
is suppressed. (d) RNAse degradation can be triggered by binding of the trans-antisense small 
RNA. (e) The translation is initiated by inhibiting the formation of RBS-blocking structures 
(adapted from [205]) 
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whereas cis-antisense, trans-encoded RNAs, and other regulatory elements, such as 
riboswitches, have been found as well. The exact functions of most of these 
elements have not been completely understood so far. The non-coding 6S RNA is 
non-existent for C. glutamicum, unlike most bacteria [169, 213, 214]. Recently, 
bioinformatics tools have been developed to predict translation initiation rate of 
mRNA regulated by small RNAs. Such approaches deliver insight into the huge 
potential of directed regulation with small RNAs for metabolic engineering 
[215]. For Escherichia coli, the regulatory function of short transcripts has been 
exploited for metabolic engineering to silence genes and fine-tune gene expression. 
Computational approaches were used to predict targets and binding efficiency 
[216-219]. In contrast to E. coli, C. glutamicum lacks a protein similar to the 
RNA chaperone /fg, which was used as supporting protein for metabolic engineer- 
ing in E. coli [169, 217, 220]. 


3.4 Riboswitches 


Riboswitches are sequences at the 5’ end of mRNAs. Their conformation can vary, 
depending on cellular conditions. Riboswitches consist of two units, the aptamer 
and the expression platform, respectively. The aptamer displays the binding part of 
the riboswitch. By direct binding of a specific metabolite as a ligand, the structure 
of the expression platform is modulated, resulting in either transcription termina- 
tion or initiation, for example by forming a hairpin structure [221-224]. Recent 
studies report the modulation of C. glutamicum metabolism by Bacillus subtilis and 
E. coli lysine riboswitches. The constructed strains carried lysine riboswitch 
between the promoter and the start codon of the g/tA gene, encoding citrate 
synthase. Lysine binds as ligand to the riboswitch and promotes transcription 
termination. In this way, the TCA cycle activity was down-regulated [221]. The 
lysine “Off” riboswitch of Escherichia coli was further engineered to function as 
the lysine “On” riboswitch in C. glutamicum by randomizing the genetic sequence 
between the aptamer and the ribosome-binding site, which caused an upregulation 
of target genes [225]. 


3.5. Translation Efficiency 


The translation initiation rate (TIR) is a key parameter in promoter-regulated gene 
expression [131]. Translation usually starts by binding of the ribosome to the 
Shine-Dalgarno sequence at the 5’-end of the mRNA. Different Shine-Dalgarno 
regions are highly conserved and differ only slightly from the consensus sequence 
GGAGG. Recognition occurs by a complementary anti-Shine—Dalgarno sequence, 
contained in the ribosomal 16S rRNA. The ribosomal binding site has a strong 
impact on the protein expression level and is consequently a valuable molecular 
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tool for strain engineering [226, 227]. In addition, the translational start codon as 
well as the genetic context and the mRNA structure are crucial factors for transla- 
tion initiation [135, 228]. Start codon design is straightforward for the metabolic 
engineering of C. glutamicum, whereby the most abundant start codon ATG results 
in higher expression levels, as compared to the rare variants GTG and TTG [72, 92, 
229]. The translational efficiency is also influenced by the 5’-untranslated region of 
the mRNA (UTR). Modification of the UTR is highly attractive for rational strain 
engineering, as translation efficiency can alternatively be reduced or increased to 
attenuate competing and to stimulate supporting pathways [230-232]. The mode of 
action is a change of mRNA conformation that influences transcript stability via 
sensitivity for endonuclease cleavage [233]. In C. glutamicum, the effect of UTR 
manipulation was nicely demonstrated for GFP expression [203, 226]. Beyond 
modification of the expression strength, appropriate secretory signal peptides can 
be used to navigate proteins to the secretory apparatus for efficient protein secretion 
[226]. With all these molecular engineering systems to hand, today’s researcher are 
well-equipped for targeted engineering of the C. glutamicum metabolism to 
improve already existing production processes or to establish new ones. 


4 Advanced Strain Engineering for Industrial 
Bioproduction 


The power of system metabolic engineering strategies together with a strong need 
for bio-based and eco-efficient production of chemicals, materials, and fuels are 
major drivers for the development of novel production processes from renewables 
[103, 197, 234, 235]. For C. glutamicum an extensive product spectrum is mean- 
while obtainable through fermentation of diverse biomass components — some fully 
established at the market, some still at the periphery (Fig. 6). 

The early years of strain development were strongly involved with random 
mutagenesis and selection, which, on the one hand allowed fast progress and 
improvement without a detailed knowledge of metabolism and physiology and, 
on the other hand, entailed a vast metabolic burden, manifested as growth defi- 
ciency, auxotrophy, and low robustness and vitality [3]. With ever-increasing 
knowledge of metabolic pathways and the development of molecular tools for 
DNA manipulation, classical methods were complemented by targeted pathway 
design and engineering for local manipulation of production-relevant key enzymes 
and reactions [36, 54, 72, 79, 86, 92, 100, 118, 242, 243]. However, a systems 
perspective was still missing until the establishment of advanced tools for global 
analysis of the cell on the various layers, namely genome, transcriptome, proteome, 
metabolome, and fluxome [244]. This enabled the next level of rational strain 
engineering — systems and synthetic metabolic engineering [12, 245, 246]. Systems 
biology data combined with computational platforms now provided an expert basis 
for global conceptual design and substantially promoted cellular engineering. This 
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Fig. 6 Product portfolio of metabolically engineered C. glutamicum through fermentation from 
renewable feedstocks [3, 9, 12, 94, 137, 236-241]. Products comprise commercial goods for 
diverse application sectors including feed and food, health and hygiene, energy and transportation, 
textiles, packaging and housing, and agricultural and technical application 


was a milestone toward generating tailored cell factories of C. glutamicum for 
production of L-lysine [55], diaminopentane [73], and L-arginine [76]. When 
targeting defined metabolic features, enzymes and biosynthetic pathways, that is, 
systems metabolic engineering with all its aspects, is most valuable for defining the 
best strategy. Proceeding further to multi-target tolerance issues involving more 
complex mechanisms, this approach becomes limiting and was hence recently 
complemented by evolutionary engineering [245]. Grown in a stress-imposed 
environment, C. glutamicum was successively evolved to become more tolerant 
to oxidative [247], thermal [248], solvent [248] and methanol [249] stress, followed 
by systems biology analysis for unravelling the underlying cellular features which 
are conferring tolerance. Combining evolution with biosensor-coupled product 
detection [250] or targeted metabolic re-engineering promises strains with 
improved production performance and greater robustness. 
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4.1 Human and Animal Health and Nutrition 


Products for health and nutrition have the longest history in biotechnology, with 
C. glutamicum being one of the major producers [9, 12, 251, 252]. Among all 
products obtainable by C. glutamicum, L-amino acids have the longest tradition and 
hold the largest market share. Meanwhile, processes for other products including 
non-proteinogenic amino acids [253-255], vitamins [236, 256], flavors and fra- 
grances [257], and other nutrients and health care products [148, 258, 259] are also 
on the rise. 


4.1.1 .L-Lysine 


Large-scale production for the feed amino acid L-lysine was established in the 
1950s, using mutants from iterative rounds of random mutagenesis. Almost exclu- 
sively produced by C. glutamicum [260], L-lysine belongs to the world’s top-selling 
amino acids with an annual production volume of around 2.5 million tons per year. 
Throughout decades of research and development, pioneering discoveries disclosed 
a set of genetic targets for local metabolic engineering. The modifications can be 
categorized according to their generic function for biosynthesis [242, 261] and 
export [262], supply of carbon building blocks [100, 108, 113, 116] and redox 
power [36, 54, 72], and competing reactions [91, 92, 102, 108]. Despite a given 
benefit for production, local engineering approaches failed to generate producer 
strains for competitive industrial application, which, however, was more recently 
overcome by systems metabolic engineering. With a titer of 120 g/L, a yield of 
55%, and productivity of 4.0 g/L/h, respectively, the current benchmark of L-lysine 
production has been achieved by comprehensive systems metabolic engineering. 
Based on comparative in vivo and in silico flux studies, 12 genetic traits were 
predicted to upgrade a non-producing wild type into a tailored hyper-producer 
[55]. Beyond the obvious benefit of this design-based strategy for industrial lysine 
production, this concept is highly promising for promoting strain and process 
development to bring novel products on market. 


4.1.2 L-Arginine and L-Ornithine 


L-Arginine and L-ornithine are intermediates of the urea cycle and thus metaboli- 
cally closely related. L-Arginine is a semi-essential amino acid with both anabolic 
and regulatory function as one of the proteinogenic amino acids and has a pro- 
nounced vasodilatory effect. To L-ornithine, a positive effect with regard to treat- 
ment of liver diseases and strengthening of the heart is ascribed [263]. Both amino 
acids are natural products of C. glutamicum and first reports on fermentative 
production date back to the early years of amino acid fermentation [264—267]. As 
compared to L-lysine, the regulation of biosynthesis is much more complex and 
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involves feedback inhibition of N-acetyl-glutamate kinase (ArgB) by arginine, and 
transcriptional control imposed by the arginine repressor ArgR [234, 268, 269]. The 
strategy for generating a genetically defined arginine producer thus involved 
genome breeding, that is, a comparative sequence analysis of wild type and 
classical producers, to identify potential regulatory mutations [108, 234]. Another 
strategy, combining systems metabolic engineering and mutagenesis recently bore 
a strain capable of producing 93 g L”' arginine with a yield of 0.4 g g | glucose 
[76]. Random mutagenesis thereby aimed at increased tolerance toward L-arginine 
analogues, being equivalent to addressing pathway regulation. This was supported 
by additional removal of repressors of the arginine operon. Further systems meta- 
bolic engineering involved optimization of NADPH levels through promoter and 
start codon engineering, disruption of L-glutamate exporter to increase L-arginine 
precursor, and flux optimization of rate-limiting L-arginine biosynthetic 
reactions [76]. 

As with the L-arginine strategy, removal of pathway regulation was a key issue 
for L-ornithine production [77, 263, 270]. Starting from basic producers, alternative 
engineering strategies generated strains with good production performance. One 
strategy thereby combined evolutionary engineering with subsequent transcrip- 
tional profiling. This revealed the upregulation of pgi (encoding glucose-6-phos- 
phate isomerase), pfkA (encoding 6-phosphofructokinase), gap (encoding 
glyceraldehyde-3-phosphate dehydrogenase), pyk (encoding pyruvate kinase), pyc 
(encoding pyruvate carboxylase), g/tA (encoding citrate synthase), gdh (encoding 
glutamate dehydrogenase), argB (encoding acetylglutamate kinase), and arg/ 
(encoding the bifunctional ornithine acetyltransferase/N-acetylglutamate synthase) 
in sum tunneling carbon from glucose via glutamate toward ornithine [263]. Expres- 
sional changes of enzymes involved in redox metabolism pointed to the relevance 
of NADPH supply [263, 271]. Overall, the combination of metabolic and evolu- 
tionary engineering yielded a production of 24 g L~' ornithine [263]. The final titer 
was more than doubled (51.5 g L~') by another study completely relying on rational 
design and engineering [77]. Subsequent to removal of pathway regulation and 
elimination of competing reactions, the biosynthetic gene cluster argC/BD was 
overexpressed on plasmid and efficient NADPH supply was assured through pen- 
tose phosphate pathway engineering [77]. 


4.1.3 Ectoine 


In recent years, the pharmaceutical and cosmetics industry started to exploit the 
stabilizing and function-preserving effects of ectoines for health and hygiene 
products [272]. Related to the natural function of these chemical chaperones as 
protecting agents against high osmolarity or temperature, their biosynthesis in 
natural producers is, in general, a stress response [273-275]. Current production 
processes are accordingly dependent on provoking high salinity [272, 273, 276], 
which requires expensive process equipment because of corrosive effects. To 
overcome this drawback, lysine-producing C. glutamicum was genetically modified 
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for salinity-decoupled production of ectoines. This was achieved by genome-based 
integration of the codon-optimized Pseudomonas stutzeri gene cluster ectABCD, 
encoding 2,4-diaminobutyrate acetyltransferase (ectA), L-2,4-diaminobutyrate 
transaminase (ectB), ectoine synthase (ecftC), and ectoine hydroxylase (ectD), 
respectively, in the ddh gene locus of C. glutamicum, whereby expressional control 
was taken by the strong and constitutive promoter of elongation factor Tu [148]. 
Elimination of by-product formation through deletion of the lysine exporter and 
subsequent bioprocess development allowed the production of 4.5 g L' ectoine 
with an estimable productivity of 6.7 g L~' day | [148]. Proceeding further to 
systems-wide engineering and integration of evolutionary strategies (here previ- 
ously described thermo-tolerance [248] appears most promising in light of the 
better production performance at increased temperature [148]) can certainly gen- 
erate improved producers, making ectoine production with C. g/utamicum more and 
more attractive. 


4.1.4 Terpenoids 


Many high-value products including the anti-cancer drug taxadiene, the anti- 
malaria drug artemisinin, and the colorful carotenoids belong to the substance 
class of terpenoids [12]. For all, the biosynthesis originates from the common 
pathway intermediate isopentenyl pyrophospate (IPP), which is further metabolized 
to the respective product. Metabolic engineering of C. glutamicum for terpenoid 
production so far mainly focused on carotenoids. Wild type strains already possess 
native gene clusters, which were modified by deletion of the crtEb gene, encoding 
lycopene elongase, and overexpression of crtE, crtB, and crt, encoding prenyl 
transferase, phytoene synthase, and phytoene desaturase to establish lycopene 
overproduction with a yield of 2.4 mg/g cell dry mass [258]. Elongation of the 
pathway and introduction of glycosyltransferases from different donor strains 
allowed production of beta-carotene and zeaxanthin as well as glycosylated deriv- 
atives thereof [259]. Moreover, C. glutamicum was engineered to produce (+)- 
valencene. Production relied on heterologous expression of (+)-valencene synthase 
from the sweet orange Citrus sinensis or from Nootka cypress, whereby improved 
supply of the precursor farnesyl pyrophosphate (FPP) by additional overexpression 
of the FPP synthase from E. coli or S. cerevisiae was crucial [277]. 


4.2 Platform Chemicals and Materials 


In our post-industrial and petrochemically coined era, it is hardly possible to 
imagine life without plastics. Driven by the need and desire for replacing fossil 
raw materials to achieve sustainability, bio-plastics are currently experiencing a 
renaissance [12, 268, 278, 279]. For C. glutamicum this includes fermentative 
supply of chemical building blocks such as succinate [110, 280, 281], 
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diaminopentane (cadaverine) [197, 282], diaminobutane (putrescine) [107, 235], 
lactate [280, 283], and propanediol [284, 285], as well as direct polymer produc- 
tion, mainly polyesters [286-288]. Here, we focus on engineering strategies for 
diaminopentane and succinate, representing building blocks for high-value poly- 
amide production. 


4.2.1 Succinate 


The relevance and attractiveness of bio-succinate becomes obvious from the great 
effort of world-leading (bio)chemical companies including Myriant (with 
ThyssenKrupp Uhde), BioAmber (joint venture with Mitsui & Co.) and Succinity 
GmbH (joint venture of BASF SE & Corbion Purac) to establish industrial scale 
fermentation processes [12, 280]. Despite not yet in the focus for production, 
C. glutamicum has been engineered for high-level succinate production 
[289]. Key modifications comprise overexpression [290] and feedback deregulation 
[291] of pyruvate carboxylase to enhance anaplerotic carboxylation, which was 
additionally stimulated by increasing the CO, level via bicarbonate supplementa- 
tion [110, 290]. Formation of by-products was eliminated through deletion of 
lactate [290, 291] and acetate [291] formation routes, and further improvement 
was achieved by overexpression of glyceraldehyde 3-phosphate dehydrogenase 
[291] and manipulation of redox metabolism [110, 291, 292]. Aerobic [293, 294] 
and micro-aerobic [295] production processes are currently evaluated, whereby 
elimination of by-product formation, disruption of the TCA cycle downstream of 
succinate, and overexpression of anaplerotic carboxylation seem most relevant for 
production [294]. Further improvement was achieved by acetate recycling and 
increased flux of the oxidative TCA cycle by amplified expression of citrate 
synthase [293]. The resulting process is biphasic and comprises an aerobic growth 
and an anaerobic production phase. The latter leads to titers up to 146 g L~' [290] 
and yields even surpassing 1.0 g g | [291]. However, the additional time and raw 
material needed to provide the cells through aerobic growth reduces the overall 
performance significantly, and still below that of natural anaerobic producers 
[296, 297]. 


4.2.2 Diaminopentane 


The high research interest in diaminopentane (DAP) is because the diamine serves 
as building block for polyamides, top-level industrial polymers with advanced 
material properties and a current market volume of several million tons per year 
[73, 278]. DAP is a naturally occurring degradation product of lysine, putting the 
spotlight for process development on the industrial lysine-producer C. glutamicum 
[197, 268, 282]. As C. glutamicum does not possess natural lysine-degradation 
pathways, DAP production in C. glutamicum was approached by heterologous 
expression of the CadA [282] and the LdcC [197] lysine decarboxylase variant of 
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E. coli. The host cell was pre-designed by feedback deregulation of lysine biosyn- 
thesis. Though both enzyme variants enabled production, the CadA strategy suf- 
fered from incomplete lysine conversion and seems inferior [282]. A further benefit 
for production was achieved by debottlenecking the carbon flux through the termi- 
nal biosynthesis, enhanced supply of the central building block oxaloacetate, and 
expression of a codon-optimized /dcC gene under control of the strong tuf promoter 
[197]. Metabolome analysis then revealed substantial secretion of N-acetylated- 
DAP as a major by-product. This involved the discovery and deletion of a so far 
unknown trans-acetylase, catalyzing the undesired cross-reaction with the 
non-natural metabolite DAP [298]. Additional engineering of product export 
[299], combined with metabolic manipulation on a systems biology level compris- 
ing attenuation of competing pathways and enhanced supply of redox power [73], 
created a streamlined cell factory that converts more than 40% of the consumed 
glucose into diaminopentane during exponential growth in batch culture [73]. - 
Fed-batch process implementation leveled production to a molar yield of 50% with 
a maximum titer of 88 g L~' and a space-time yield of 2.2 g L~' h ' [73]. The 
generation of the advanced DAP producer represents a breakthrough for providing 
fully bio-based polyamides such as PA5.4 (copolymerization with succinic acid) 
and PA5.10 (copolymerization with sebacic acid from castor oil). The latter was 
successfully manufactured as polymer in pure and glass fiber reinforced form 
[73]. The excellent material properties of the novel bio-based PA5.10, surpassing 
that of conventional petrochemical nylons PA6.6 and PA6, are encouraging. 


4.3 Biofuels 


Though C. glutamicum is not a designated example for biofuels, some attempts 
have been made to establish biofuel production, mainly processes for ethanol [111] 
and isopropanol [103, 300]. 


4.3.1 Ethanol 


In C. glutamicum, ethanol fermentation strictly relies on heterologous genes. 
Obtaining pyruvate decarboxylase (pdc) and alcohol dehydrogenase (adhB) from 
Zymomonas mobilis under expressional control of the promoter of the lactate 
dehydrogenase gene (/dhA), a basic producer was generated [111]. Subsequent 
deletion of the genes /dhA and ppc, encoding phosphoenolpyruvate carboxylase, 
avoided formation of the major by-products lactate and succinate. Under oxygen 
limitation, a yield of 0.53 g g | was achieved, though at low product concentration, 
likely related to tolerance issues [12, 111]. The ethanol production rate was, 
however, substantially higher than that of many other bacteria reported so far 
[8]. At a high cell density of 60 g L™! cell dry weight, growth arrested cells 
produced ethanol at a rate of 30 g L~' h' [111]. This appears to be a promising 
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starting point for further improvement. With regard to the antiseptic activity of 
ethanol, tolerance is one of the key issues. Recent laboratory evolution experiments 
proved valuable in conferring tolerance to methanol [249] and accompanying 
tolerance to thermal and solvent stress [248]. Identification of the assignable 
cause and subsequent re-engineering [249] seems to be a reasonable approach for 
next level ethanol producers. 


4.3.2 Isobutanol 


Strain engineering for efficient isobutanol production has profited substantially 
from the knowledge gained from branched-chain amino acid fermentation 
[3, 103, 119, 301]. The isobutanol strategy [103] thus relied on a previous strain 
design for valine-producing C. glutamicum [101, 302, 303]. Complete pathway 
design comprised alsS (acetohydroxy acid synthase from B. subtilis) and ilvCD 
(acetohydroxyacid isomeroreductase and dihydroxyacid dehydratase from 
C. glutamicum) along with downstream genes for the subsequent decarboxylation 
(kivd, encoding ketoacid decarboxylase from L. /actis) and reduction (adhA, 
encoding alcohol dehydrogenase from C. glutamicum) of 2-ketoisovalerate to 
isobutanol [103]. In a Apyc Aldh background, this enabled production of 4.9 g L™! 
isobutanol. Higher level production relied on a 2-ketovalerate production strain that 
was additionally modified by inactivation of lactate and malate dehydrogenases, 
implementation of ketoacid decarboxylase from Lactococcus lactis, alcohol dehy- 
drogenase (ADH2) from S. cerevisiae, and expression of the pntAB transhydrogenase 
gene from E. coli [300]. The highest titer so far of 73 g L~' was achieved by an 
approach that, in addition to metabolic engineering, also considered tolerance issues, 
which was addressed by process operation via continuous solvent extraction during 
fermentation [304]. 


4.4 Toward Non-food Substrates 


Strategies to achieve sustainability are strongly driving new biosynthetic chemistry 
processes from renewable feedstocks [4]. Traditionally, however, biotechnology 
builds on glucose and starch as fermentation substrates — raw materials that are 
equally serving for human nutrition. Beyond these feedstocks, there are vast 
amounts of other, so far unused, bio-based substrates, including lignocellulosic 
biomass, natural oils, or waste streams from different industries [289]. To make 
them bioavailable for C. glutamicum, some assimilation routes were implemented 
to establish processes from diverse sugars [42, 44, 48, 305-308], oligo- and poly- 
mers thereof [309-312], alcohols [313], sugar alcohols [314, 315], organic acids 
[14], and green juices [15, 237]. As one of the major novel raw materials, pentoses 
are discussed below. In addition, interesting developments toward direct use of 
polymeric raw materials in one-step consolidated bioprocesses are described. 
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4.4.1 Pentoses 


Being a major constituent of hemicellulose, pentose sugar is highly abundant on 
Earth and thus of value and relevance when talking about alternative fermentation 
feedstocks [4]. For enabling growth on xylose in C. glutamicum, heterologous 
expression of a single gene — xy/A, encoding xylose isomerase — is sufficient, though 
additional overexpression of xy/B, encoding xylulokinase, supports xylose assimi- 
lation [44]. Using this strategy, diaminopentane can be produced by modified 
C. glutamicum strains from xylose as single carbon source and, beyond its pure 
form, from xylose-containing sugar mixtures obtained from hemicellulose hydroly- 
zates [48]. Integrated analysis of the physiological response to xylose fermentation 
on the level of transcriptome and in vivo fluxes provided new insights into xylose 
metabolism and unraveled further optimization targets for systems-wide engineering 
[47]. A superior strain created from these findings efficiently converted xylose into 
diaminopentane with a yield of 32% and a titer of 60.1 g/L [47]. Arabinose-based 
production was established using the recombinant arabinose-operon of E. coli 
[42, 43]. In additional to extension of the substrate spectrum, a major issue is 
tolerance against toxic substances, such as furfural, hydroxymethylfurfural (HMF), 
and phenol, typically present in lignocellulosic feedstocks after pre-treatment 
[4, 316, 317]. C. glutamicum has a natural capacity for detoxification of furfural 
[318] which was recently associated with the fudC gene, conveying the ability for 
reduction of furfural to furfuryl alcohol [319]. Enhanced robustness can also be 
conferred by overexpression of mycothiol glycosyltransferase mshA [320]. For the 
future, further discovery of tolerance mechanisms and their manipulation remain 
essential for realizing lignocellulose-based biorefineries with C. glutamicum. 


4.4.2 Sugar Polymers 


The commercially used fermentation substrates are naturally bound in homo- or 
heteropolymers with specific and polymer-dependent composition and branching, 
in general requiring pretreatment before use [321]. This was overcome in recom- 
binant C. glutamicum strains that express polymer-degrading pathways from 
diverse donor strains for direct utilization of starch [309], cellobiose [322-324], 
cellulose [325, 326], and lignocellulose [327]. As most polymers are not 
transported into the cell, the hydrolyzing enzymes need to get to the extracellular 
environment. This was addressed by different approaches involving either secretory 
systems for releasing soluble enzymes into the medium [310, 325] or cell-surface 
display of the degrading enzymes [311, 322, 327, 328]. The enzymatic set mainly 
comprises hydrolase such as a-amylase [309-311, 328, 329], B-glucosidase 
(322, 323], cellulases [327], and endoglucanase [325, 326]. However, hydrolysis 
rates are usually higher at higher temperature and lower pH as compared to 
the optimal growth environment of C. glutamicum [48, 330], favoring strains 
with increased tolerance through metabolic and evolutionary engineering 
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[248, 331]. Recent and current studies also aim at substrate co-utilization toward 
implementation of consolidated bioprocesses [332, 333]. 


5 Conclusions 


We are currently standing at a turning point from a petroleum-based industry to a 
bio-based economy. Riboflavin production is an impressive and promising exam- 
ple, where petrochemical production was completely replaced by bio-based pro- 
duction with optimized strains of Bacillus subtilis and Ashbya gossypii within only 
30 years [230]. The strong foundation for this sustainable development has been 
built since the discovery and establishment of microbial fermentation processes 
[10]. Throughout the last few decades, innovations in strain engineering, automa- 
tion, and mechanization have provided invaluable tools to face upcoming chal- 
lenges. Most encouraging, systems and synthetic metabolic engineering has 
provided novel and alternative routes for producing chemicals, materials, nutrients, 
and health care products from renewable raw materials [12, 280, 334]. Success 
stories from engineering C. glutamicum toward production of lysine [55], arginine 
[76], diaminopentane [73], and valine [335] give substantial encouragement. Some 
of these products have already become established in the market, some with 
decades-lasting tradition, whereas other innovative products need to prove their 
benefit and value to advance from the state of feasibility to the state of commer- 
cialization. What remains open at this point beyond pathway engineering comes 
from large-scale set-ups of industrial production. These fermentations typically 
have their own peculiarities such as nutrient, pH, oxygen and temperature gradients 
related to mixing issues, as well as complex, partly toxic raw materials. Strain 
robustness and vitality are therefore key targets to be addressed more precisely in 
the future. Evolutionary approaches to address the required multi-target cellular 
responses to overcome process stress have recently been found to be valuable for 
improving the tolerance of C. glutamicum [159, 248, 249] and production perfor- 
mance [250, 271]. Upon integrating evolutionary adaptation and re-engineering 
within the concept of systems and synthetic metabolic engineering, we can expect a 
new level of strain engineering to move forward to a bio-based economy. 
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Synergizing '*C Metabolic Flux Analysis 
and Metabolic Engineering for Biochemical 
Production 


Weihua Guo, Jiayuan Sheng, and Xueyang Feng 


Abstract Metabolic engineering of industrial microorganisms to produce 
chemicals, fuels, and drugs has attracted increasing interest as it provides an 
environment-friendly and renewable route that does not depend on depleting 
petroleum sources. However, the microbial metabolism is so complex that meta- 
bolic engineering efforts often have difficulty in achieving a satisfactory yield, titer, 
or productivity of the target chemical. To overcome this challenge, '*C Metabolic 
Flux Analysis ('*C-MFA) has been developed to investigate rigorously the cell 
metabolism and quantify the carbon flux distribution in central metabolic pathways. 
In the past decade, '*C-MFA has been widely used in academic labs and the 
biotechnology industry to pinpoint the key issues related to microbial-based chem- 
ical production and to guide the development of the appropriate metabolic engi- 
neering strategies for improving the biochemical production. In this chapter we 
introduce the basics of '“C-MFA and illustrate how '*C-MFA has been applied to 
synergize with metabolic engineering to identify and tackle the rate-limiting steps 
in biochemical production. 


Keywords Biofuels, Bottleneck, Cell metabolism, Cofactor imbalance, Isotope, 
Synthetic biology 
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1 Introduction 


Producing chemicals from renewable resources would reduce strong dependence on 
petroleum and damage to the environment. Recently, with the development of 
metabolic engineering and synthetic biology, microbial production of a wide 
range of bulk chemicals [1-4], biofuels [5S—9], and drugs [10—17] from renewable 
feedstock has been achieved successfully with many industrial microorganisms 
such as Escherichia coli [18-23] and Saccharomyces cerevisiae [24—27]. Among 
all the biosynthesized chemicals, however, only a few have achieved a satisfactory 
production level with a titer, yield, and productivity high enough for industrial 
commercialization [28, 29]. Therefore, it is crucial to develop novel strategies in 
metabolic engineering to improve microbial-based chemical production. 

One of the main reasons for the low production level of engineered microorgan- 
isms is the complexity of cell metabolism [29]. Microbial production of chemicals 
is more than converting the precursors to the products. Rather, the microbial 
metabolism needs to coordinate the carbon flux [30, 31], cofactor supply [32-34], 
cell maintenance [10, 35, 36], and other factors [37—40] to achieve the production of 
target chemicals at a high level. The metabolic engineering strategies adopted to 
manipulate microbial metabolism often only focus on a few known challenges (e.g., 
poor gene expression) but also introduce new problems (e.g., metabolic burden) 
which prevent the microorganisms from achieving high-level chemical production. 
Such complex behavior of microbial physiology presents one of the biggest obsta- 
cles in current microbial-based chemical production. 

To elucidate the metabolic rewiring of microorganisms and, more importantly, 
derive the appropriate strategy to engineer microorganisms for biochemical produc- 
tion, a technology named '*C Metabolic Flux Analysis ('3C-MBA) was developed in 
the 1990s [41-46]. Basically, '“C-MFA uses carbon isotopes to trace the cell 
metabolism and employs mathematical modeling to uncover the carbon flux distri- 
butions in metabolic networks of microorganisms [41, 42, 47-50]. By comparing the 
variations of metabolic fluxes among different engineered microorganisms, the key 
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issues, such as the bottleneck pathway, could often be discovered and hence guide the 
bioengineers to develop more appropriate metabolic engineering strategies [36, 51- 
56] to improve chemical production. In the past decade, we have witnessed many 
successful applications of '*C-MFA to help metabolic engineers improve the micro- 
bial production of chemicals [31, 35, 52, 54, 57, 58], and '3C_MFA has been widely 
recognized as one of the most important tools to diagnose microbial metabolism and 
develop novel metabolic engineering strategies [30, 31, 33, 35, 36, 54, 59-61]. 

In this chapter we summarize the synergistic tactics of '3C-MFA and metabolic 
engineering from cases of improving microbial-based chemical production in the 
past decade. We first briefly introduce the principle of '7C-MFA, and then catego- 
rize the ways in which '*C-MFA synergizes with metabolic engineering into four 
groups: (1) uncovering the bottleneck steps in biochemical production, (2) identi- 
fying cofactor imbalance issues of host metabolism, (3) revealing cell maintenance 
requirement of industrial microorganisms, and (4) elucidating the mechanism of 
microbial resistance to fermentation inhibitors. We also point to emerging areas 
where breakthroughs of 'C-MFA could potentially benefit rational metabolic 
engineering for improving microbial-based chemical production in the near future. 


2 Technology Platform of ‘°C Metabolic Flux Analysis 


The technology platform of '3C-MFA was first developed in the 1990s [41-46]. In 
the past two decades, mathematical algorithms and high-throughput mass spec- 
trometry technology have been rapidly developed and have enabled more accurate 
quantitative analyses of metabolic fluxes for a broad scope of species. Because 
several protocols have been published to describe the procedures for both model 
and non-model organisms [47, 62], we focus on providing a concise introduction 
for 3C_MFA, which includes cell culture and fermentation, isotopic analysis of 
metabolites, and '*C-assisted pathway and flux analysis (Fig. 1). 


2.1 Cell Culture and Fermentation 


Cell culture on '*C-labeled carbon substrates is the first step for '3C-MFA and plays 
a vital role for the entire analysis. Three key factors have been recognized in this 
step, namely the composition of the medium, the cultivation mode, and the selec- 
tion of '°C-labeled substrates. 

First, a strictly minimal medium with a single carbon source is often required for 
the '3C-labeling experiments. This is because multiple carbon substrates and 
unlabeled nutrients could be assimilated by microorganisms, which “dilutes” the 
isotopic labeling of key metabolites, mystifies the carbon fate of metabolites of 
interests, and increases the difficulty of both carbon consumption measurements 
and accurate flux calculations [63]. However, it is worth mentioning that for certain 
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Fig. 1 Technology platform for '*C-MFA 


genetically engineered strains (e.g., S. cerevisiae) with auxotrophic markers, a trace 
amount of unlabeled exogenous amino acids could be supplemented into the 
minimal medium in order to support cell growth. Recently, several studies have 
reported alternative approaches for calculating the intracellular fluxes with complex 
medium composition and/or additional nutrients [64, 65]. 

Second, '*C-MFA traditionally focuses on the metabolic flux distributions at the 
metabolic steady states, requiring both metabolic and isotopic steady states of 
microorganisms, that is, the concentration and isotopic labeling of intracellular 
metabolites do not change. Such requirements can be met by culturing microor- 
ganisms in either of the two modes: (1) batch mode, often using shaking flasks or 
culture tubes to culture microorganisms and harvesting biomass samples in log 
growth phase as a “pseudo” metabolic and isotopic steady state and (2) chemostat 
mode, often using bioreactors with continuous feeding to culture microorganisms 
and harvest biomass samples after two to three generations as the “real” metabolic 
and isotopic steady state. Although a chemostat setup can precisely control the 
desired metabolic status for metabolic flux analysis, the batch mode is simpler and 
more cost-effective. Till now, the majority of the '5C-MFA in academic labs have 
been accomplished by sampling the '3C-labeled biomass in late-log or early 
stationary growth phase when culturing microorganisms in batch mode [62]. Sev- 
eral advanced '*C-assisted flux analysis approaches can also be implemented at 
either metabolic non-steady-state [30] or isotopic non-steady-state [66-70] to 
uncover the kinetic behaviors of intracellular metabolic rewiring by using novel 
computational tools (discussed in Sects. 4.2 and 4.3). 
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Third, the choice of '3C_labeled substrate that should be used for '3C-MFA is 
case-specific. In general, traditional '3C-labeled glucose composition, that is, 80 wt 
% fi-F ej and 20 wt% [U-3c] glucose, can easily introduce sufficient '5C carbons 
into the metabolites of interests for accurate mass spectrometry analysis and further 
flux analysis [47, 71-74]. On the other hand, pure and singly labeled carbon 
substrates are more sensitive for detection of novel pathways because it is easier 
to trace labeled carbons in intermediate metabolites. For example, the [s-"C] 
lactate was used to investigate the biofilm metabolism of Shewanella oneidensis 
MR-1 and elucidated the heavy use of C1 metabolism in biofilm cells [75]. In brief, 
it was expected that ‘°C would accumulate in most metabolites because it was 
difficult to remove the labeled carbon of the lactate from the S. oneidensis cells 
through well-known central pathways such as the TCA cycle. However, the high 
concentration of unlabeled metabolites in the biofilm cells indicated the high 
activity of C1 metabolism, which was the only known metabolic pathway to release 
the labeled carbon of lactate from S. oneidensis cells as AEG: Additionally, 
multiple ‘°C tracers are also used sometimes as they can also improve the flux 
resolution. 


2.2 Isotopic Analysis of Metabolites 


Experimental measurements of isotopic labeling of '3C-labeled metabolites, for 
example, proteinogenic amino acids, are often achieved by mass spectrometry, 
which detects the fractions of the total population of any molecular fragment that 
is unlabeled, singly labeled, doubly labeled, etc. By correcting the effects of 
naturally labeled isotopes on the analysis of '*C-labeled metabolites, we can obtain 
the isotopic distributions for the metabolites of interest with high sensitivity and use 
them as isotopic “fingerprints” to determine the metabolic fluxes [47, 76, 77]. Gen- 
erally, three major procedures are commonly used to obtain such isotopic “finger- 
prints”: metabolite extraction and separation, isotopic labeling detection, and 
correction of natural isotopomers. 

Overall, many of the metabolite candidates for the flux analysis need to be 
extracted from cell biomass or culture medium. Sometimes the intracellular metab- 
olites have low abundance and stability. Thus, a quick metabolite quenching 
method and sensitive mass spectrometry are often used to collect the isotopic 
labeling data [78]. The extracted metabolites could either be treated with a low 
heat derivatization group, N,O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) or N- 
methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA), followed by analysis in gas 
chromatography—mass spectrometry (GC-MS) [43], or directly injected without 
any treatment into liquid chromatography—mass spectrometry (LC-MS) [79], a 
machine that has much higher sensitivity than GC-MS. 

For those metabolites with high abundance and stability, such as over-produced 
chemicals [79] and proteinogenic amino acids, the quenching step could be 
bypassed. Instead, the samples are often treated with N-tert-butyldimethylsilyl-N- 
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methyltrifluoroacetamide (MTBSTFA), a cheap and commonly used derivatization 
group, in a high heat process, followed by GC-MS analysis of the isotopic labeling. 
The derivatization process renders the molecules volatile enough to enter the GC 
column but also introduces considerable amounts of naturally labeled isotopes. 
Therefore, a systematic correction is required for the raw mass isotopomer spec- 
trum prior to flux calculation. Several algorithms [80-82] have been well- 
established to curate the isotopic labeling and remove the effects of natural isotopes 
so that a mass distribution vector (MDV) for each metabolite can be generated and 
directly used for the pathway and flux analysis. 


2.3. C-Assisted Pathway and Flux Analysis 


Based on the corrected MDV, the metabolic behaviors of microorganisms can be 
elucidated both qualitatively (i.e., pathway analysis) and quantitatively (i.e., flux 
analysis). On one hand, the '*C-assisted pathway analysis often aims to answer 
whether a metabolic pathway is active in non-model microorganisms by measuring 
the '°C-labeled patterns (i.e., MDVs) in key metabolites and determining the fate of 
biomolecule synthesis in the denoted biochemical pathways. One example of the . 
C-assisted pathway analysis is the discovery of Cl metabolism in biofilm 
S. oneidensis as mentioned above. On the other hand, the '*C-assisted flux analysis 
aims to quantify the carbon fluxes in multiple metabolic pathways by simulating the 
'3C-labeled patterns (i.e., MDVs) in key metabolites and searching for the “real” 
metabolic fluxes that could lead to the best fit of the measured '*C-labeled patterns. 
Such quantitative analysis often reveals the network level rewiring of carbon 
fluxes in industrial workhorses (e.g., E. coli) when engineered for biochemical 
production. In short, although '*C-assisted pathway analysis is suitable for pathway 
discovery in non-model microorganisms, '5C-assisted flux analysis is more useful 
in identifying the metabolic rewiring in industrial microorganisms. 

In the past decade, '3C-assisted flux analysis has been widely applied to uncover 
the central metabolisms of various species. Accordingly to a curated database [82] 
recently developed to collect the central carbon metabolic flux distributions inves- 
tigated by '3C-MFA, over 500 metabolic flux analyses have been accomplished so 
far for 36 organisms (Fig. 2). Most '3C-MFA studies focus on investigating 
metabolism of EF. coli and S. cerevisiae. However, there is a trend that other 
industrial microorganisms, such as Clostridium and Cyanobacteria, can initiate 
more '*C-MFA studies because of their importance in biochemical production and 
relatively less well-known cell metabolism. Also, with the wide application of ae 
MFA, many '5C-MFA software packages, such as OpenFLUX2 [83], 13CFLUX2 
[84], Metran [85], INCA [86], FiatFLUX [87], and Biomet Toolbox 2.0 [88], have 
been developed by using highly efficient mathematical algorithms (e.g., elementary 
metabolite unit, EMU [85, 89]) to simulate '3C_labeled patterns and calculating 
carbon fluxes in metabolic networks (Table 1). Thus, some of the difficulties, 
especially the computational load, of '3C-MFA have been dramatically decreased. 
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Fig. 2. Summary of current '°C-MFA studies on different organisms 


It is reasonable to believe that the numbers of '*C-MFA studies could increase by 
orders of magnitudes in the next decade or two. 


3 Synergy of ‘°C Metabolic Flux Analysis and Metabolic 
Engineering 


The ultimate goals of metabolic engineering are to design and build engineered 
biological systems that can produce chemicals, materials, food, and drugs at high 
yield [90]. However, the lack of fundamental understanding of cellular responses 
during industrial fermentation often prevents metabolic engineers achieving a 
satisfactory production of biochemical. In the past decade, '*C-MFA has been 
widely used to provide insightful information about microbial metabolism and 
successfully helped metabolic engineers to improve biochemical production. 
Here, we have summarized recent successes on synergizing '3C-MFA and meta- 
bolic engineering (Tables 2-4) and have organized them into four categories: 
(1) uncovering the bottleneck steps in biochemical production, (2) identifying 
cofactor imbalance issues of host metabolism, (3) revealing cell maintenance 
requirement of industrial microorganisms, and (4) elucidating the mechanism of 
microbial resistance to fermentation inhibitors. 
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3.1 Uncovering the Bottleneck Steps in Biochemical 
Production 


3.1.1 Bottlenecks in Acetyl-CoA Synthesis 


As a central metabolite, acetyl-CoA plays an important role in a series of cellular 
functions. In metabolic engineering, acetyl-CoA is a key precursor in the biosyn- 
thesis of sterols, amino acids, fatty acid-derived chemicals, polyketides, and 
isoprenoid-derived drugs [51]. To accommodate the cellular requirement, the 
organisms use a variety of routes for acetyl-CoA synthesis (Fig. 3), such as the 
oxidative decarboxylation of pyruvate, the oxidation of long-chain fatty acids, and 
the oxidative degradation of certain amino acids. The most common way to produce 
the acetyl-CoA is the direct conversion from pyruvate by pyruvate dehydrogenase 


A in S. cerevisiae 66-6 ———~ rusP B inE.coli 6PG ———> RusP pe 
em{- “™ | 
GLC —> GeP 


xsP RSP x5P R5P  KDPG 
ca) ; 
ul 
s7P + F6P s7P G3P 
E4P. F6éP { saul, 


DHAP —> G3P FBP 
{ { Case study 1 
Glyc PEP n-Butanol biosynthesis 


Fep ~<*: Pyr 


Case study 2 
DHAP => G3P Fatty acid synthesis 


OAA CIT 


oon, 
eS \ = 


‘SUC -Butanol rad B-oxidation 


Fig. 3 Case studies that identify key bottleneck steps of biochemical production via '*C-MFA and 
the corresponding metabolic engineering strategies. (a) n-Butanol biosynthesis in S. cerevisiae. (b) 
Fatty acid synthesis in E. coli. Please note that the pathways shown in Fig. 2 are schematic and 
there could be missing pathways. Abbreviations: G6P Glucose 6-phosphate, F6P Fructose 
6-phosphate, 6PG 6-Phosphogluconate, RuS5P Ribulose 5-phosphate, X5P Xylulose 5-phosphate, 
RSP Ribose 5-phosphate, E4P Erythrose 4-phosphate, PEP Phosphoenolpyruvate, SA Shikimic 
acid, MAA Mycosporine-like amino acids, AA Amino acids, PG/ Phosphoglucose isomerase, 
GOPDH G6P dehydrogenase, TKL Transketolase, ARO Pentafunctional protein ARO|p, DHAP 
Dihydroxyacetone phosphate, G3P Glyceraldehyde 3-phosphate, Glyc Glycerol, AceP Acetyl-P, 
EtOH Ethanol, Pyr Pyruvate, ACAL Acetaldehyde, Ace Acetate, AcCoA Acetyl-CoA, OAA 
Oxaloacetate, CIT Citrate, ICIT Isocitrate, SUC Succinate, MAL Malate, Glox Glyoxylate, GPD 
Glycerol-3-phosphate dehydrogenase, XpkA Phosphoketolase, ACK acetate kinase, PDC Pyruvate 
decarboxylase, ADH Alcohol dehydrogenase, PDH Pyruvate dehydrogenase, cyto-PDH Cytosolic 
pyruvate dehydrogenase, ACS Acetyl-CoA synthetase, ACL ATP citrate lyase, CL Isocitrate 
lyase, MLS Malate synthetase, MaCoA Malonyl-CoA, ACC Acetyl-CoA carboxylase, FAS Fatty 
acid synthesis enzymes, FAD Fatty acids degradation enzymes 
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(PDH) [91, 92], pyruvate ferredoxin oxidoreductase (PFO), pyruvate NADPH 
oxidoreductase (PNO) [93], or pyruvate formate lyase (PFL) [94] under anaerobic 
conditions. Other acetyl-CoA synthesis pathways, for example, acetyl-CoA syn- 
thetase (ACS) [95] and citrate lyase (ACL) [96], also play important roles in acetyl- 
CoA supplements for different organisms, especially for supplying the cytosolic 
acetyl-CoA as the precursor for various biochemical products. 

To investigate acetyl-CoA biosynthesis in living cells, '3C-MFA was used to 
compare S. cerevisiae strains growing under purely oxidative, respiro-fermentative 
and predominantly fermentative conditions [97]. Based on the flux distributions, the 
activated pyruvate bypass pathway, that is, converting pyruvate to acetaldehyde and 
then to acetate for synthesis of cytosolic acetyl-CoA, was found to be the main 
pathway used by S. cerevisiae to supply cytosolic acetyl-CoA. However, the flux in 
the pyruvate bypass pathway was not strong enough to supply sufficient cytosolic 
acetyl-CoA when engineering S. cerevisiae to produce acetyl-CoA-derived 
chemicals, such as n-butanol. To increase the capability of producing cytosolic 
acetyl-CoA in S. cerevisiae, various metabolic engineering strategies have been 
adopted to enhance further cytosolic acetyl-CoA availability, including the disrup- 
tion of competing pathways [51] and the introduction of heterologous biosynthetic 
pathways with higher catalytic efficiency and lower energy input requirement, such 
as cytosolic localized PDHs (cytoPDHs) [51] and ATP-citrate lyase (ACLs) [96]. In 
one of the studies that evaluated the effects of various acetyl-CoA synthesis 
pathways on n-butanol production, the cytoPDHs was found to work best and led 
to threefold increased n-butanol production in the engineered S. cerevisiae (Fig. 3a 
and Table 2) [51]. 

In addition to uncovering the bottleneck of cytosolic acetyl-CoA biosynthesis for 
wild-type yeast, '*C-MFA was also used to elucidate the effect of a heterogonous 
acetyl-CoA enhanced pathway, that is, phosphoketolase pathway (PHK), in a 
genetically modified yeast strain in which the genes xpkA and ack from Aspergillus 
nidulans were introduced [52]. The PHK pathway was originally utilized by several 
bacterial species [98] and filamentous fungi for glucose dissimilation as an alter- 
native to the Embden—Meyerhof—Parnas pathway (EMP). For example, in 
A. nidulans, the utilization of this metabolic pathway led to increased carbon flow 
toward acetate and acetyl-CoA through the action of a phosphotransacetylase 
[99]. Flux distribution in the central metabolic pathways showed the positive role 
of the PHK pathway on improving the supply of cytosolic acetyl-CoA in the 
S. cerevisiae strain, which also accounted for the improved acetate yield. Encour- 
aged by this discovery, the same PHK pathway was co-expressed together with a 
wax ester synthase (ws2) and successfully improved the titer of fatty acid ethyl 
esters by 1.7-fold [100]. Such proof-of-concept studies indicated that the PHK 
pathway could be established as a stand-alone route to divert flux from glycolysis 
to cytosolic acetyl-CoA supply, and holds great potential for future improvement of 
the production of acetyl-CoA-derived chemicals. 
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3.1.2 Bottlenecks in Fatty Acid Synthesis 


Fatty acids are the precursors to produce transportation fuels and industrial 
chemicals including surfactants, solvents, and lubricants [57]. The microbial pro- 
duction of fatty acid-derived chemicals has recently been achieved in many indus- 
trial applications. Escherichia coli can serve as an excellent host for fatty acids 
production because of its fast growth, simple nutrient requirements, well- 
understood metabolism, and well-established genetic tools. However, only a 
small amount of free fatty acids is detectable under normal conditions in the 
wild-type F. coli. The synthesis of saturated fatty acid starts with the conversion 
of acetyl-CoA into malonyl-CoA catalyzed by ATP-dependent acetyl-CoA carbox- 
ylase and the transesterification of malonyl-CoA into an acyl carrier protein (ACP) 
catalyzed by malonyl-CoA ACP transacylase (fabD), followed by cyclic chain 
elongation (Table 2) [101]. 

In spite of various fatty acid over-producing strains that have been created, most 
studies focus on engineering terminal enzymes in fatty acid biosynthesis pathways 
and little is known about how central metabolism responds to fatty acid production. 
To reveal the metabolic bottlenecks in fatty acid production, '3C-MFA has been 
performed by using an engineered fatty acid over-producing F. coli DH1 strain with 
over-expression of ftesA, and fadR genes and knock-out of fadE gene (Fig. 3b) 
[57]. This '°C-MFA study clearly showed that the E. coli metabolic flux was 
redistributed in response to over-production of fatty acid. Basically, compared to 
the wild-type E. coli strain, the flux in the engineered strain was significantly 
diverted from acetate synthesis to fatty acid synthesis, indicating that an increase 
in the supply of key precursors in fatty acid synthesis is crucial to increasing 
subsequent fatty acid synthesis. The fluxes of the pentose phosphate pathway 
(PPP) also dramatically increased to supply large amounts of reduction power, 
mostly NADPH, to support the fatty acid production in the engineered strain. 
Finally, the flux of the anaplerotic pathway into the TCA cycle decreased 1.7- 
fold in the engineered strain and, consequently, more carbon fluxes were diverted to 
supply cytosolic acetyl-CoA, the starting point of fatty acid biosynthesis. Overall, 
as indicated by '3C-MFA, the supply of fatty acid precursor and NADPH was 
recognized as the key bottleneck in microbial engineering for fatty acid production. 

To improve fatty acid production, numerous engineering strategies have been 
suggested and explored. For example, to overcome the challenge of the limited 
supply of fatty acid precursors, the acetyl-CoA carboxylase was over-expressed to 
provide more malonyl-CoA, a key precursor for fatty acid synthesis, which suc- 
cessfully enhanced the production of fatty acids [102, 103]. In another study, the 
fatty acid degradation pathway was removed by knocking out fadE in E. coli. 
Together with the over-expression of tesA and fabF, the yield of fatty acids was 
enhanced by nearly threefold [104]. Similarly, another study showed that by 
co-expressing fabZ and a thioesterase from Ricinus communis in a fadD (a key 
gene in fatty acid degradation) deletion mutant, the fatty acid titer was enhanced by 
nearly threefold (Table 2) [105]. In total, the precursor issue identified by 3C_MFA 
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has now been well-addressed in microbial engineering for fatty acid production. 
The inadequate supply of NADPH, another issue revealed by '5C-MFA in fatty acid 
production, could be the next matter to which metabolic engineers need to pay 
attention. 


3.1.3 Bottlenecks in Pentose Phosphate Pathway 


Pentose phosphate pathway, well-known as the limitation step in providing suffi- 
cient NADPH for biochemical synthesis (discussed in Sect. 3.2), is also the 
essential pathway to provide the precursors for the synthesis of nucleotides and 
nucleic acids from ribose 5-phosphate and aromatic amino acids (e.g., phenylala- 
nine and tyrosine). Several aromatic compounds, such as shikimic acid, a valuable 
drug precursor, could be produced from metabolites in the pentose phosphate 
pathway. To investigate the bottleneck in the pentose phosphate pathway for the 
biosynthesis process of shikimic acid, '*C-MFA was recently applied to four 
different engineered S. cerevisiae strains which were engineered to produce 
shikimic acids in different amounts [106]. By comparing flux distributions of the 
four strains with different shikimic acid productions, a higher flux through the 
pentose phosphate pathway was positively correlated with higher production of 
shikimic acid. This analysis indicated that the low flux into the PP pathway could be 
the bottleneck for the shikimic acid production. Indeed, it was found that when 
removing the original phenylalanine and tyrosine synthesis pathway, and 
overexpressing arol, aro4, and tkl genes to improve the metabolic fluxes in the 
PP pathway, the shikimic acid was increased by nearly twofold in S. cerevisiae 
(Table 2). 

Similarly, riboflavin is an important industrial bio-product from the PP pathway, 
which has been commercially produced by engineering Bacillus subtilis strains 
[107]. '°C-MFA has also been implemented for both wild-type and engineered 
B. subtilis strains in the past two decades [108, 109] to unravel the metabolic 
rewiring in the riboflavin producing strain, which could further improve the ribo- 
flavin production. The intracellular flux distributions of a riboflavin-producing 
B. subtilis strain has been rigorously investigated via '“C-MFA at three different 
dilution rates in chemostats [108]. It was found that the PP pathway was activated, 
which not only supplied sufficient precursors but also produced sufficient NADPH. 
More interestingly, cofactor NADPH was always excessively produced in 
B. subtilis strains under all the three dilution rates based on the flux analysis, 
especially in the low dilution rate without riboflavin production. In other words, 
the estimated amount of NADPH required was found to be less than the NADPH 
formations for both biomass and riboflavin production. Thus, the high production of 
riboflavin and purine nucleotides is attributed to the sufficient precursor supply 
from the PP pathway in B. subtilis. It is worth noting that the transhydrogenase, 
which catalyzed the reversible conversion of NADPH to NADH, played an impor- 
tant role to re-oxidize the excessive NADPH that was generated because of the 
highly activated PP pathways. 
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3.2 Identifying Cofactor Imbalance Issues of Host 
Metabolism 


Cofactors, for example, NADH/NAD* and NADPH/NADP*, play a major role as 
the redox carriers for catabolic and anabolic reactions as well as the important 
agents in transfer of energy for the cell. NADH/NAD* functions as a cofactor pair 
in over 300 redox reactions and regulates various enzymes and genetic processes 
[56]. Under aerobic growth, NADH acts as an electron carrier for the transportation 
of electrons from the carbon source to the final electron acceptor, oxygen. Under 
anaerobic growth with the absence of an alternative oxidizing agent, the regener- 
ation of NAD* is of great significance for the redox balance, which is achieved 
through fermentation by using NADH to reduce metabolic intermediates 
[110, 111]. Thus, the balance issue of NADH/NAD*™ is crucial for both aerobic 
and anaerobic conditions. NADPH/NADP", as the phosphorylation products of 
NADH/NAD*, drives the anabolic reactions. The enzymatic synthesis of some 
important compounds, for example, fatty acids and amino acids, depends heavily 
on cofactor NADPH as the reducing equivalents. The major pathways supplying 
NADPH during heterotrophic growth on glucose are the oxidative pentose phos- 
phate pathway, the Entner—-Doudoroff pathway, and NADP*-dependent isocitrate 
dehydrogenase in the TCA cycle [53]. Additionally, the balance between NADH 
and NADPH also plays an important role to provide sufficient NADPH for the 
anabolic reactions. NAD(P) transhydrogenase can catalyze the reversible conver- 
sion between NADH and NADPH to balance the cofactors [112]. With the expense 
of 1 mol ATP, NAD kinase also can catalyze the conversion from NADH to 
NADPH [55]. 

It is conceivable that in cofactor-dependent production systems, cofactor avail- 
ability and balance issue play a vital role in dictating the overall process yield. 
Hence, the identification of key pathways to balance the cofactor levels would be 
helpful for the rational design of metabolic engineering strategies to increase 
biochemical production further. '*C-MFA is one of few analytical tools that can 
rigorously determine the cofactor usage in cell metabolism and has been used with 
many industrial microorganisms to reveal the cofactor imbalance issues related to 
biochemical production. In this section, we have summarized several important 
discoveries of the cofactor balance issues via '*C-MFA and the corresponding 
metabolic engineering strategies to solve such issues (Fig. 4 and Table 3). 


3.2.1. Cofactor Imbalance in Xylose Fermentation of S. cerevisiae 


S. cerevisiae with the ability to ferment sugars anaerobically to ethanol at high rates 
has been domesticated for millennia and continuously selected as a workhorse for 
bioethanol production. However, S. cerevisiae cannot utilize xylose anaerobically, 
which is the second most abundant composition in lignocellulose, a renewable and 
non-food-competitive resource. One of the commonly used strategies to engineer 
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Fig. 4 Cofactor imbalance issues identified by '*C-MFA and the corresponding metabolic 
engineering strategies. Left column: cofactor imbalance issues in (a) xylose utilization of 
S. cerevisiae strains, (b) fatty acid and fatty acid-derived chemical production, and (c) L-valine 
production. Right column: the corresponding metabolic engineering strategies to tackle cofactor 
imbalance issues: (d) altering the co-factor specificities of xylose reductase (XR) or xylitol 
dehydrogenase (XDH); (e) overexpressing transhydrogenase to balance the NADH and 
NADPH. Abbreviations: mXR Mutated xylose reductase, mXDH Mutated xylitol dehydrogenase, 
oxPP pathway Oxidative pentose phosphate pathway 


S. cerevisiae to utilize xylose is the introduction of a fungal xylose pathway from 
xylose-utilizing yeasts such as Pichia stipites. Through this fungal xylose pathway, 
xylose could be converted to fermentable xylulose through the consecutive redox 
reactions catalyzed by NADPH-dependent xylose reductase (XR) and NAD*- 
dependent xylitol dehydrogenases (XDH), with xylitol produced as the intermedi- 
ate. However, the use of different cofactors in the fungal xylose pathway brings 
about a notorious cofactor imbalance issue and severely limits the xylose utilization 
in S. cerevisiae. More importantly, the cofactor imbalance issue is not standalone. 
Rather, as shown in a few 13C_MFA studies, it is intertwined with the central 
metabolism to induce network-level rewiring of carbon fluxes. Basically, a system- 
atic investigation of xylose utilization of recombinant S. cerevisiae strains in 
oxygen-limited conditions for ethanol production was accomplished by '*C-MFA 
[36]. By implementing the ‘°C tracer experiments and running metabolic flux 
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analysis for six recombinant strains with different origins of XR and XDH in the 
xylose pathway [36], a universally high activity of the oxidative pentose phosphate 
pathway was found to supply the NADPH for the XR. The strong activities in the 
TCA cycle was also found and indicated that huge amounts of NADH needed to be 
consumed by oxidative phosphorylation. Concurrent with the global metabolic 
rewiring, only a small amount of the carbon fluxes was diverted to ethanol 
production. 

To solve the cofactor imbalance issues in the xylose utilization of S. cerevisiae, 
numerous efforts in metabolic engineering have been devoted. One of the strategies 
is the partial alteration of the cofactor preference for these two enzymes, that is, 
altering the preference of XR to use NADH or altering XDH to use NADP” as the 
cofactors, which would generate a cofactor balance cycle for the initial two steps of 
xylose utilization to balance the cofactor utilization. The cofactor engineering 
strategy has proved to be functional in several studies. For example, by replacing 
the native Pichia stipitis XR with a mutated XR with increased preference of 
NADH, the ethanol yield was improved by ~40% with the decreased xylitol 
production [113-119]. Similar successes have also been achieved in several other 
attempts to increase the NADP” preference for the XDH, which have successfully 
improved the ethanol production by 28-41% [120-123]. The other strategy used to 
tackle the cofactor imbalance issue is to engineer the cofactor-dependent metabolic 
pathways which could decrease the xylitol production and enhance ethanol yield. 
For instance, lowering the flux through the NADPH-producing pentose phosphate 
pathway could lead to increasing ethanol yield and decreasing xylitol production. 
This is attributed to an inadequate supplement of NADPH which could improve the 
NADH preference of XR, and thus partially balance the cofactor usage of XR and 
XDH [124]. In addition, replacing the NADPH-producing PP pathway (glucose-6- 
phosphate dehydrogenase) with a fungal NADP*-dependent p-glyceraldehyde-3- 
phosphate dehydrogenase (NADP-GAPDH) could produce NADPH for the XR 
without losing any carbon, and could provide more carbon for ethanol production 
[125]. Improving the NAD* regeneration directly by introducing heterogeneous 
genes could also decrease xylitol production and increase ethanol production 
[126]. Beside the two-step xylose utilization pathway, using xylose isomerase is 
another efficient approach for the xylose fermentation as it does not require 
cofactors when converting xylose to xylulose, and hence bypasses the cofactor 
imbalance issue (Table 3). However, as indicated by a recent 'SC-MFA study, the 
lower glycolysis activity that led to inefficient re-oxidation of NADH could poten- 
tially be a new bottleneck step when using xylose isomerase in S. cerevisiae [127]. 


3.2.2 Cofactor Imbalance in Chemical Biosynthesis 


The production of many chemicals, such as fatty acids and amino acids, requires a 
large amount of cofactors. Thus, cofactor imbalance issues are tightly related not 
only to the sugar utilization but also to the chemical production. For example, by 
using '5C-MFA to analyze the cell metabolism in wild-type and fatty acid over- 
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producing E. coli strains, it was found that the engineered strain requires excessive 
NADPH compared to the wild-type strain, that is, 255 units compared to 179 units 
NADPH with the flux of glucose uptake normalized at 100 units. However, the sum 
of NADPH supplied from central metabolism could only reach 100 units [57], 
which clearly indicated that more NADPH production would be needed to increase 
fatty acid production in E. coli. To balance the NADPH usage, an alternative 
transdehydrogenase pathway that converted NADH to NADPH was activated in 
the engineered E. coli. The flux of the transdehydrogenase pathway was increased 
by 70% compared to that in wild-type strains (i.e., from 90 to 153 units) to support 
fatty acid biosynthesis. 

Realizing the importance of cofactor balance, particularly the NADPH supply, 
in biochemical production, metabolic engineers have adopted various strategies to 
overcome this challenge. One strategy is to switch the specificities of glycolytic 
enzymes, for example, GAPDH, from NAD*-dependence to NADP*-dependence, 
which could build an NADPH-producing glycolysis pathway to increase the bio- 
availability of NADPH and further improve the NADPH-dependent lycopene 
production by ~100% (Table 3) [53]. It is also used to redirect the metabolic flux 
from the glycolysis pathway into the pentose phosphate pathway to enhance 
NADPH supply by the overexpression of zwf that encodes glucose-6-phosphate 
dehydrogenase (G6PDH) [128-130], deletion of pfkA and pfkB that encode the 
phosphofructokinase (PFK) [131], or deletion of pgi that encodes phosphoglucose 
isomerase [132, 133]. In addition, transhydrogenase [134, 135] or NAD kinase [53] 
was overexpressed to boost further the NADPH/NADP*” availability in E. coli and 
other microorganisms. 

In addition to the production of fatty acids, biosynthesis of amino acids, such as L- 
lysine and L-valine, requires NADPH as cofactor of the enzymatic reactions. Cory- 
nebacterium glutamicum, one of the industrial workhorses for producing amino acids, 
has been considered an important microorganism with extensive '5C-MFA studies 
[46, 136-139]. Many metabolic engineering strategies provided by '5C_MFA have 
been developed to improve the amino acids production. For example, L-lysine is 
one of the major products of C. glutamicum, which is synthesized from the pyruvate 
and oxaloacetate consuming 4 mol NADPH for 1 mol L-lysine. To study the 
intracellular metabolic rewiring of the L-lysine producing C. glutamicum strains, 
several '“C-MFA studies have been implemented and uncovered the fact that the 
PP pathway was increased to supply NADPH. The anaplerotic carboxylation 
pathway was also enhanced to provide enough precursors for L-lysine synthesis 
[137, 138, 140]. Similarly, a significant increase in the PP pathway flux was also 
found to be associated with L-valine production in a pyruvate decarboxylase- 
deficient C. glutamicum strain via '3C-MFA, which again indicated that the 
NADPH supply was the key issue in L-valine production [54]. 

Based on the insightful information from '5C-MFA studies, various metabolic 
strategies have been developed to improve L-lysine and L-valine production. First, 
to overcome the inadequate NADPH supply, the enzymes in PP pathway, such as 
glucose-6-phosphate dehydrogenase [141], transketolase, and transaldolase [142], 
as well as 1,6-bisphosphatase [143], were overexpressed to redirect fluxes toward 
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the PP pathway to improve L-lysine production. In addition to the strategy of 
overexpressing the native enzymes in the PP pathway, the alteration of cofactor 
specificity of GAPDH from NAD*-dependence to NADP*-dependence has suc- 
cessfully improved the L-lysine production by ~50% without decreasing cell growth 
rate [144]. To investigate further the metabolic responses of such alteration, '°C- 
MFA was performed again to compare the intracellular flux distributions between 
wild-type strains and mutants with alternated GAPDH. It was found that the 
mutated GAPDH pathway was the major source of the NADPH in the mutated 
strain with the similar PP pathway flux, but with higher L-lysine production. Last 
but not least, by cloning a transhydrogenase from E. coli to enhance NADPH supply 
in C. glutamicum, L-valine yield in C. glutamicum strain was dramatically improved 
by ~200% (Fig. 4c and Table 3) [54]. These discoveries were consistent with the 
expectations of the metabolic engineering strategies and, more importantly, dem- 
onstrated that '*C-MFA could indeed rationally guide the metabolic engineering 
and improve microbial performance. 


3.3 Revealing Cell Maintenance Requirement of Industrial 
Microorganisms 


Metabolic engineering is frequently equated with the heterologous production of a 
series of recombinant proteins. Nowadays, with the development of synthetic 
biology approaches, more and more heterologous pathways have been introduced 
into a host cell to produce non-natural products with multiple genes inserted, 
deleted, replaced, or overexpressed. On one hand, the genetic manipulation could 
modify cell metabolism and divert more carbon flux into the desired chemicals. On 
the other hand, the metabolic engineering, particularly heterologous protein 
overexpression, could interfere with the host metabolism and generate severe 
metabolic burdens because the protein expression could be energetically expensive 
during transcription and translation. Such issues, however, have not yet been 
studied very much in the field of metabolic engineering. 

The metabolic burden of industrial microorganisms is often reflected as elevated 
cell maintenance energy of industrial microorganisms, as revealed in several 
pioneering '3C-MFA studies. In one of the '°C-MFA studies, Pichia pastoris, a 
methylotrophic yeast with an attractive ability to produce various heterologous 
proteins [64-67], was investigated by introducing a mock plasmid, a low-copy 
plasmid to express Rhizopus oryzae lipase, and a high-copy plasmid to express 
R. oryzae lipase, respectively. It was found that the TCA cycle fluxes of both 
protein-expressing P. pastoris strains were much higher than the control strain in 
producing more ATP to sustain cell growth, confirming that the protein folding and 
conformational stress indeed imposed a metabolic burden on the microbial host. 
The similar metabolic rewiring, that is, elevated TCA cycle fluxes to provide more 
ATP for cell maintenance, was also found in an S-adenosyl-L-methionine (SAM) 
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producing an S. cerevisiae strain [35] and a xylose-utilizing S. cerevisiae strain 
(Table 4) [36]. 

To avoid the introduction of metabolic burdens, metabolic engineers have 
explored three strategies: medium optimization, use of low-copy plasmids, and 
promoter engineering. Optimization of cultural medium and fermentation condition 
could potentially remove stresses such as nutrient limitation, and hence reduce the 
requirement for cell maintenance energy. For example, by using several novel 
feeding strategies with cultural SAM-producing P. pastoris, the production of 
SAM was found to be improved by ~35% [145, 146]. In addition, it has been 
found that the utilization of a high-copy plasmid may increase risk of plasmid 
instability and metabolic burden [147], as the protein over-expression requires 
tremendous amounts of building blocks and energy, which could jeopardize the 
normal cell growth and increase the metabolic burden. It was found that using a 
low-copy plasmid sometimes could be a better choice for chemical production. For 
example, in the study aiming to engineer E. coli to produce lycopene [148], the cell 
density of the engineered E. coli with high-copy plasmid at stationary phase was 
approximately 24% lower than the one with low-copy plasmid and 30% lower than 
the control culture. Similarly, the titer of lycopene in the E. coli with high-copy 
plasmid was 20% lower than that with low-copy plasmid. Another method to 
decrease metabolic burden is to tune the promoter strengths of various genes to 
balance the pathways and to avoid the accumulation of certain toxic intermediates 
as growth inhibitors. One example uses this method to engineer a more efficient 
production of taxadiene in E. coli [10]. In general, by tuning the expression levels of 
two modules in the taxadiene pathway, a native upstream methylerythritol phos- 
phate (MEP) pathway forming isopentenyl pyrophosphate and a heterologous 
downstream pathway forming terpenoid, an inhibitory intermediate compound for 
cell growth, indole, was achieved at the minimal accumulation by expressing the 
upstream pathway at a very low level. Correspondingly, the taxadiene production 
was improved by ~15-fold. 


3.4 Elucidating the Mechanism of Microbial Resistance 
to Fermentation Inhibitors 


Environmental stresses, such as physical heat shock [149] and chemical acidity 
[150], could affect the physiology and viability of microbial cells and decrease or 
even stop the bioprocess productivity. For example, the lignocellulosic biofuels 
hold promises for a sustainable fuel economy. However, the chemical stresses from 
the toxic compounds in processed lignocellulosic hydrolysates, for example, weak 
acids, furans, and phenolic compounds [151], have hampered the economic feasi- 
bility of biofuels. Thus, it is important to identify the intracellular metabolic 
responses of industrial microorganisms to various stresses in order to improve 
their resistance to inhibitors rationally [40]. Compared to other commonly used 
approaches, such as transcriptomics and proteomics analysis, '*C-MFA is more 
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intuitive and provides direct and quantitative readouts of the metabolic rewiring 
under stress conditions. In this section, we introduce recent advances in the study of 
stress response using '5C_MFA and the corresponding metabolic engineering strat- 
egies to improve microbial resistance to different inhibitors (Fig. 5). 
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Fig. 5 Mechanisms of microbial stress responses identified by '*C-MFA. Top: stress responses of 
S. cerevisiae to the furfural. Bottom: stress responses of E. coli to octanoic acid. Abbreviations: 
ADH Alcohol dehydrogenase, PDH Pyruvate dehydrogenase, PoxB Pyruvate oxidase, PdhR 
Pyruvate dehydrogenase regulator, NADH-DH NADH dehydrogenase, CycBO Cytochrome bo 
oxidase, Cyc c Cytochrome C 
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Among the various toxic compounds from the lignocellulose pretreatment and 
hydrolysis, furfural is an important contributor to the toxicity for S. cerevisiae. 
Although it has been found that S. cerevisiae has a weak intrinsic ability to reduce 
furfural to the less toxic furfuryl alcohol, the holistic view of metabolic responses to 
furfural is still missing. To investigate further the flux distribution of S$. cerevisiae 
under the increasing strengths (concentrations) of furfural stress, '*C-MFA has 
been applied for both wild-type and several evolved furfural-resistant strains in 
micro-aerobic and glucose-limited chemostats [39]. As revealed by '3C-MFA, 
NADH-dependent oxireductases, which catalyzed the reduction of furfural, were 
the main defense mechanisms at lower concentration of furfural (<15 mM), 
whereas NADPH-dependent oxireductases became the major resistance mechanism 
at high concentration of furfural (>15 mM). Thus, the carbon flux of pentose 
phosphate increased as the main physiological response to high concentrations of 
furfural, which indicated that the NADPH supply was the key to help S. cerevisiae 
better resist furfural stress. Inspired by this discovery, metabolic engineers 
overexpressed several NADPH-dependent oxireductases, particularly ADH7 and 
YKLO71W, and successfully increased furfural resistance in the parent 
S. cerevisiae strain by 200% [39]. 

In another study, '5C-MFA was used to examine the metabolic responses of 
E. coli to octanoic acid stress [38]. When comparing the flux distributions of 
stressed and unstressed E. coli strains, a decreased flux in the TCA cycle and an 
increased flux in the pyruvate oxidative pathway for producing acetate were 
observed. It was hypothesized that octanoic acid triggered the membrane disruption 
and led to NAD* deficiency because of the destabilization of membrane-bound 
proteins, such as NADH dehydrogenase, which would down-regulate several key 
NAD*-dependent pathways, such as the malate dehydrogenase pathways in the 
TCA cycle, and the pyruvate dehydrogenase multi-enzyme complex pathway. The 
pyruvate pool also shrank under octanoic acid stress condition, which could be 
attributed to the repression of the pdhR regulator, a regulator with high sensitivity to 
pyruvate in controlling the expressions of the PDH complex, NADH dehydroge- 
nase II, and cytochrome bo-type oxidase encoded by aceEF and IpdA, ndh, and 
cyoABCDE, respectively [152]. Based on the discussion of '3C-MFA results, 
several possible strategies to enhance further the C8 acid tolerance were proposed, 
including the supplementing of pyruvate in the medium and the replacement of 
NADH/NAD*“-sensitive enzymes. 


4 Perspectives of Synergizing ‘°C Metabolic Flux Analysis 
with Metabolic Engineering 


The conventional '*C-MFA has been widely applied to determine microbial metab- 
olism and to guide metabolic engineers in the development of numerous strategies 
to improve biochemical production. However, there are still several technique 
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limitations that restrict the accuracy and flexibility of '*C-MFA. For example, the '° 
C-MFA can only be applied at metabolic and isotopic steady states [62], which 
could be difficult to use when the target chemicals are produced in a non-steady 
state (e.g., drug synthesis in the stationary growth phase). In addition, most of the 
conventional '°C-MFA studies are limited in central metabolism [153], which has 
very limited use when analyzing the secondary metabolism of microorganisms. To 
overcome these challenges, novel experimental and computational methods have 
recently been developed to empower '5C-MFA studies. In this section we summa- 
rize recent breakthroughs in '*C-MFA and provide a perspective for novel routes to 
achieve synergy of '*C-MFA and metabolic engineering. 


4.1 Expand '°C-MFA into Genome Scale 


The conventional '*C-MFA can only be applied to determine flux distribution in the 
central metabolic network, mainly because of the difficulties in (1) measuring the 
isotopic labeling of the numerous low-abundant metabolites and (2) the huge 
computational burden of simulating isotopic labeling of all metabolites in 
genome-scale metabolic networks. However, with the rapid development of high- 
resolution mass spectrometry, the accurate measurement of isotopic labeling of 
low-abundant metabolites becomes possible, as reported by several groups [154— 
156]. For computational simulation, an E. coli genome-scale model (imPR90068) 
has recently been constructed for '3C_MFA [153], which spans 1,039 metabolites 
and 2,077 reactions. To calculate the genome-scale metabolic flux distribution, a 
total of 1.37 x 10!’ isotopomers need to be simulated [153]. Thanks to the 
implementation of the EMU method [85], the computational burden was decreased 
by one to two orders of magnitude and, for the first time, the fluxes in all the 
metabolic pathways of E. coli were elucidated. Compared to the conventional oe 
MFA, the genome-scale '*C-MFA could rigorously determine the metabolic 
rewiring in secondary metabolism, from which many high-value chemicals, such 
as drugs, could be produced. The genome-scale '*C-MFA could provide valuable 
information about the metabolic rewiring in response to the production of these 
secondary metabolites and guide the development of rational metabolic engineering 
strategies in a similar way to that used for improving bulk chemical production. 


4.2. Isotopic Non-stationary *7C-MFA (‘3C-INST-MFA) 


'3C-INST-MFA is a cutting-edge technology recently developed [66—70] to enable 
the application of '°C-MFA for various autotrophic systems including 
cyanobacteria and plants. In brief, instead of collecting '*C-labeling patterns at 
the isotopic steady state, '*C-INST-MFA tracks the dynamics of '°C-labeling in 
intracellular metabolites and applies computational algorithms to calculate the 
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steady-state metabolic fluxes that can best fit the '*C-labeling kinetics. '*C-INST- 
MFA has been applied to determine the photosynthetic metabolism of 
Synechocystis sp. PCC6803 [157] and Arabidopsis thaliana [158]. Such autotrophic 
metabolism is unable to be examined by conventional '*C-MFA because all of the 
metabolites are universally labeled at the isotopic steady state when feeding with '* 
CO, and the information about pathway usage is completely lost. The merit of '°C- 
INST-MFA for metabolic engineering lies in the fact that the metabolisms of 
numerous autotrophic systems, which used to be mysterious, can now be rigorously 
determined. Because many autotrophic systems are promising cell factories [159- 
161] that convert CO, into valuable chemicals, we can envision that 13C_INST- 
MFA could guide metabolic engineers to understand better and to modify more 
rationally such systems for improving the production of autotrophic products. 


4.3. C-Based Dynamic Metabolic Flux Analysis ( 13C-DMFA) 


'3C-DMFA has recently been developed as an approach to investigate microbial 
metabolism at a metabolic non-steady state [30]. Compared to conventional kinetic 
models to describe microbial dynamics [162-165], '5C_DMFA could reveal the 
dynamic reprogramming of intracellular fluxes and thus provides in-depth under- 
standing of microbial metabolism in the pathway level. In one of the proof-of- 
concept studies, the EF. coli metabolism in a fed-batch fermentation process for 
overproduction of 1,3-propanediol was investigated. By introducing several addi- 
tional parameters to describe the fed-batch fermentation process, a time-resolved flux 
map was generated and showed that the intracellular flux associated with the PDO 
pathway increased by 10% and the split ratio between glycolysis and the pentose 
phosphate pathway decreased from 70/30 to 50/50. '3C-DMEFA has provided a way 
for metabolic engineer to investigate the dynamic metabolism during industrial 
fermentation, especially fed-batch fermentation. It is also expected that '3C-DMFA 
could be further extended to study microbial metabolism at stationary growth phase, 
during which numerous high-value secondary metabolites are often produced. With 
the insightful information about the metabolic rewiring at non-steady state, metabolic 
engineers could develop more appropriate strategies to improve the biochemical 
production, particularly microbial-based drug production. 


4.4 Improve Flux Resolution of ‘°C-MFA via the Integration 
of Isotopic Patterns from Parallel Labeling Experiments 


Parallel labeling experiment design has been widely applied in '3C-MFA to 
improve the observability of global metabolic network by conducting multiple 
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labeling experiments with different isotopic tracers simultaneously [166- 
169]. Another recent advance in '3C_MFA is the integration of the data (i.e., 
isotopic labeling patterns) from parallel labeling experiments to improve flux 
resolution [170-172]. The integration of the data from parallel labeling experiments 
has been combined with rapid development of the high-throughput measure tech- 
niques and computational algorithms [170-172] which would offer unique advan- 
tages compared to conventional '3C-MFA [173], particularly by improving the 
precision of flux estimation [170, 171] and reducing the time of labeling experi- 
ments. With the more precise measurement of intracellular carbon fluxes, it is 
reasonable to conclude that higher resolution of microbial metabolism should be 
provided for metabolic engineers in the near future and fine-tuned engineering 
strategies should be developed for general applications in improving biochemical 
production. 
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Xenobiology: State-of-the-Art, Ethics, 
and Philosophy of New-to-Nature Organisms 


Markus Schmidt, Lei Pei, and Nediljko Budisa 


Abstract The basic chemical constitution of all living organisms in the context of 
carbon-based chemistry consists of a limited number of small molecules and poly- 
mers. Until the twenty-first century, biology was mainly an analytical science and has 
now reached a point where it merges with engineering science, paving the way for 
synthetic biology. One of the objectives of synthetic biology is to try to change the 
chemical compositions of living cells, that is, to create an artificial biological diver- 
sity, which in turn fosters a new sub-field of synthetic biology, xenobiology. In 
particular, the genetic code in living systems is based on highly standardized chem- 
istry composed of the same “letters” or nucleotides as informational polymers (DNA, 
RNA) and the 20 amino acids which serve as basic building blocks for proteins. The 
universality of the genetic code enables not only vertical gene transfer within the same 
species but also horizontal gene transfer across biological taxa, which require a high 
degree of standardization and interconnectivity. Although some minor alterations of 
the standard genetic code are found in nature (e.g., proteins containing non-conical 
amino acids exist in nature, and some organisms use alternated coding systems), all 
structurally deep chemistry changes within living systems are generally lethal, mak- 
ing the creation of artificial biological system an extremely difficult challenge. 

In this context, one of the great challenges for bioscience is the development of a 
strategy for expanding the standard basic chemical repertoire of living cells. 
Attempts to alter the meaning of the genetic information stored in DNA as an 
informational polymer by changing the chemistry of the polymer (i.e., xeno-nucleic 
acids) or by changes in the genetic code have already yielded successful results. In 
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the future this should enable the partial or full redirection of the biological infor- 
mation flow to generate “new” version(s) of the genetic code derived from the “old” 
biological world. 

In addition to the scientific challenges, the attempt to increase biochemical diver- 
sity also raises important ethical and philosophical issues. Although promotors of this 
branch of synthetic biology highlight the many potential applications to come (e.g., 
novel tools for diagnostics and fighting infection diseases), such developments could 
also bring risks affecting social, political, and other structures of nearly all societies. 


Keywords Ethics, New-to-nature, Non-canonical amino acids, Philosophy, 
Synthetic biology, Xenobiology 
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1 The Future of Biology 


The genetic program of living cells is considered as the “software of life” 
[1, 2]. However, are we able to read and interpret this “software” correctly? If so, 
we would be able to understand how life works and we could try to change and even 
improve the “software” with man-made versions. From the viewpoint of synthetic 
biology, living cells are small, programmable production units (e.g., “similar to” a 
robot or a chemical machine). Researchers on the frontlines of this field are seeking 
ways to understand and create new types of cells for useful purposes, such as 
engineering cells to produce nearly every imaginable chemical compound for 
utilization (not only natural compounds but also synthetic compounds) in the 
advancing fields of medicine and technology. 

From the beginning of agricultural domestication (e.g., wheat cultivation which 
can be dated back a thousand years) [3], and especially from the onset of genetic 
engineering (since the 1970s to 1980s and the development of molecular cloning 
technologies) [4], the gap between natural and modified organisms is steadily 
increasing, such that the modified organisms are not only those harboring hetero- 
geneous genes from other natural organisms but also those with totally artificial 
genetic makeup. At the end there awaits artificial life, genetically as well as 
metabolically distant from its natural origin. The new life forms will probably be 
genetically isolated, that is, they possess a kind of a genetic firewall serving as a 
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biological containment strategy to prevent horizontal gene transfer [5]. Horizontal 
gene transfer is a common cause of a gene spreads from one species to another, 
facilitated mainly by phage mediated transduction, sequence independent uptake of 
free DNA (transformation), or pili mediated conjugation. Accordingly, a stepwise 
model with eight developmental phases toward the creation of alternative forms of 
life has been developed [6], as shown in Fig. 1. This model was supplemented with 
the categories of xenobiology and genetic firewall, and placed in the current 
scientific development of xenobiology somewhere between steps 6 (synthetic 
genomes), 7 refactored genomes, and eventually 8 (alternative genomes). 

One of the keystones of Darwinism is the fact that geographically (and hence 
genetically) isolated species tend to evolve unique and heritable changes over time. 
The classical example is Darwin’s finches, which illustrates the way that gene pools 
of the finch have adapted to take advantage of feeding conditions in different 
ecologic settings for long-term survival. What is true for Darwin’s finches also 
applies to cells in general. Through man-made, directed evolution of life forms we 
can attempt to achieve the implementation of new and sophisticated chemistries 
(elements, reactions, metabolic pathways) into the protoplasm of desired life forms 
[7]. It is the ambition of a number of synthetic biologists to find out experimentally 
how far we could go toward this objective [8]. As for the idea of a genetic firewall, it 
is crucial to learn whether the chemical standard composition of terrestrial life 
forms (invariant for almost four billion years!) could be changed in principle and 
whether we could open the door to a parallel biological world. Meanwhile, some 
caveats need to be considered for those experiments to create new species, as issues 
brought up by Buckling et al. for experiments on evolution, e.g., the simplicity of 
“testing tube” conditions, the homogeneity of the testing population, and other 
unpredictable factors. All these caveats for experimental evolution are also appli- 
cable to xenobiological approaches to create the parallel biological system [9]. 

In nature, the energy flux on Earth mandates a cyclic material flow with a 
simultaneously continuous maintenance of order, resulting in the formation of 
living systems. Morowitz stated that such a process was essential and deterministic 


natural 
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[10]. All the necessary information about the energy and material flows is encoded 
in the genomes of organisms. Following this paradigm, the chemical compositions 
and the choice for the fundamental units both affect the processes of life. The 
molecules participating in life on Earth cover primarily amino acid polymers (pro- 
teins), nucleic acids (mainly DNA and RNA), lipids, and other small molecules 
which act as coenzymes and cofactors. Although the number of monomeric build- 
ing blocks of life is rather small, their transient combinations lead to very diverse 
molecules [11]. The genetic program of all living cells (and viruses) is mainly based 
on information encoded in nucleic acid structures and most biological activities are 
determined by protein structures. Besides, huge repertoires of other macromole- 
cules such as fatty acids, carbohydrates, and small molecules, and metabolic 
pathways and modes of information processing, are common to the cells [12]. 


2 Biology Can Be Synthesized Biologically 
and/or Chemically 


In the first years of today’s ubiquitous synthetic chemistry, the synthesis of complex 
substances, originally produced from plants and animals, was assumed to be an 
impossible task. Additionally, a lot of physiological conditions were experimen- 
tally inaccessible in those days. This left space for the appearance of metaphysical 
concepts such as the idea that organic compounds were just formed in the presence 
of a special, vital power (“vis vitalis”) acting exclusively in creatures. Accordingly, 
metaphysical concepts were used as the main criteria to decide between animate 
and inanimate matter [2, 13]. Yet at the beginning of the nineteenth century this 
metaphysical viewpoint was proven wrong by chemical synthesis of organic mol- 
ecules (e.g., urea in Woehler’s Harnstoffsynthese in 1828) [14]. Although this was 
not the first milestone for the synthesis of naturally occurring organic compounds, 
starting from then the awareness of the accessibility of natural, organic molecules 
increased. Complex compounds could be manufactured starting from simple struc- 
tures in a stepwise and controlled manner. Less than 50 years later, organic 
synthetic chemistry has turned into an engineering discipline with the ambition to 
synthesize all naturally occurring organic substances [15]. Nowadays, synthetic 
biology has a similar goal: to define biological parts of living systems as modules, 
standardize them, and combine these standardized parts into a novel organism. 
Xenobiology goes a step further, aiming at the compositional (chemical) redesign 
of these particular modules [16-18], which goes beyond the concept of building 
novel system based on naturally existing or modified modules for synthetic biology. 
In general, xenobiology aims to design biological systems endowed with unusual 
biochemistries. 

The concept of modularity, a prerequisite for synthetic biology, arose out of the 
observed successes in other engineering fields (e.g., software or electronic engi- 
neering) [19, 20]. A modular approach should facilitate the simplification of 
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biological systems to make it possible to define first principles for a biological 
hierarchy until the creation of biological systems from scratch is achievable 
(bottom-up principle). As soon as these modular units, for example, synthetic 
networks on the levels of transcription, translation, and signal transduction, as 
well as the metabolism, are orthogonalized (uncoupled) from their biological 
context, it should be feasible to add new parts to a system without facing unwanted 
side effects or cross reactions. Of course, this is an idealization of biological 
engineering, because in reality unforeseen effects emerge as soon as the complexity 
of the desired organism increases. 

Biomolecules, and especially genes, have plenty of undetermined degrees of 
freedom on the molecular level and on the level of interactions and functions, which 
explains the difficulty when orthogonalizing these molecules within the biological 
context [21]. This is best illustrated by recently reported construct of a new 
synthetic “minimal” Mycoplasma containing 149 genes of unknown function 
which are somehow essential for growth on a defined growth medium [22]. It is 
indeed surprising that almost one-third of the “minimal” genome corresponds to 
unknown functions. This was however not obvious in the initial report used for 
creating artificial life. The construction of the “first self-replicating synthetic 
bacterial cell” was accomplished via copying a natural genome, in which additional 
synthetic but essentially inactive DNA sequences were inserted [23]. Nonetheless, 
the successful construction of synthetic cells proves that chemically synthesized 
modules could turn into living cells, which paves way for turning synthetic 
genomes containing unnatural genetic letters into living organism as well. 


3 Motives for the Development of New Biological Systems 


Synthetic biology offers a perspective for the development of a multitude of novel, 
chemically diverse biocatalysts for the production of fuel, additives, or medicines, 
amongst others [24]. Although synthetic biology mainly works with naturally 
existing building blocks and a canonical chemistry, xenobiological applications 
use non-natural building blocks and non-canonical chemistry. Thus the aim of 
xenobiology covers the implementation of these man-made chemical syntheses in 
living cells, e.g., engineered organisms that could conduct metathesis pathways 
[25] or similar chemical transformations which are still the exclusive domain of the 
synthetic organic chemist. 

The simplest biological models used in xenobiology are microorganisms, which 
can be seen as ready-made production systems primarily ruled by their genetic 
programs [26-28]. By introducing small changes in the genetic program of an 
organism, a bioengineer can reach significant changes in terms of the production’s 
results. This is done experimentally with refunctionalization, reprogramming, or 
recoding of natural processes [29, 30]. 

The fundamental difference between synthetic biology and xenobiology is that 
in synthetic biology living systems are restructured via exchange and combination 
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Fig. 2 Differences between GMOs and CMOs in a hypothetical experiment of genome-wide 
redesign of standardized genetic code in a given organism. (GM genetic modification, MAGE 
multiplex automated genome engineering [31]) 


of standardized parts (modules, biobricks). In contrast, xenobiology uses 
non-natural (or so-called non-canonical) molecules to create CMOs (chemically 
modified organisms) [16]. These CMOs manage to use up-to-date unused chemical 
elements (e.g., fluorine or boron), novel “letters,” building blocks, or scaffolds (the 
differences between GMOs and CMOs are shown in Fig. 2). To achieve this, 
researchers plan to come up with an alternative genetic code, necessitating a 
conversion of the whole flow of genetic information [1, 5, 7, 11, 12, 32]. 

Other than serving as novel building blocks for genomes, non-canonical DNA 
bases can develop into diagnostic tools for infectious diseases [33]. The unnatural 
base pair system consists of an expanded genetic alphabet built into oligo nucleo- 
tide fragments on specific sites, or via enzymatic incorporation of extra, functional 
components into nucleic acids. These fragments containing unnatural base pairs can 
be obtained via PCR amplifications. Diagnostic molecular beacons with fluorescent 
dye linked to the unnatural bases can serve as molecular diagnostic tools, for 
example, to target infectious diseases of interest [34]. Furthermore, aptamers 
containing unnatural bases are considered valuable for pharmaceutical applications 
because of their unique features in affinity, thermo stability, and resistance to 
nucleases [35]. 

If we manage to change the way the genetic code is read in a living organism and 
to add new “letters” or building blocks, the corresponding cell constitutes a genetic 
enclave because the genetic exchange with natural cells is impaired. This is an 
important aspect for biological safety, because the risk of horizontal gene transfer to 
natural cells is supposed to be strongly reduced [16, 21, 24, 36-38]. Therefore, 
xenobiology seeks conditions in which the cells can be cultivated in the laboratory 
but stay genetically isolated from naturally occurring species [39]. 
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4 Present State of Xenobiology 


Currently, xenobiology and even synthetic biology is not widely harnessed to tackle 
modern technological questions because of the overwhelming diversity of existing 
structures and information transmission pathways (e.g., horizontal gene transfer, 
mutation, recombination) already present in nature. Reprogrammed cells or pro- 
teins equipped with synthetic structures are just considered as useful tools for 
academic research or small applications, if any. Interestingly, however, 
xenobiology is not a more recent development. Yet in the 1950s and 1960s (albeit 
under a different name), the incorporation of, for example, ncAAs into the prote- 
ome of organisms was demonstrated to be approachable [40]. Back then, auxotro- 
phic microbial strains were used, which lost their ability to synthesize a particular, 
essential nutrient and forced them — by feeding a structurally similar artificial 
compound — to adjust to this certain substitute, leaving them just the choice to 
“take it or leave it” [41-43]. 

The current synthesis of alternative biological systems within the framework of 
genome engineering is in particular focused on the three universal biomolecules 
DNA, RNA, and amino acids, and on the genetic code redesign via directed 
evolution of microbial strains. All basic constituent parts of DNA, that is, the 
nucleobases, the deoxyribose, and the phosphate backbone, can be exchanged 
with alternative chemical structures such as xeno (noncanonical) nucleic (XNAs) 
[32, 44-49]. According to this rationale, we certainly face progressive advancement 
in the construction of novel biological systems running with XNA in the near 
future. For example, experimental evolution has been successfully used to engineer 
bacterial genomes with XNA [50] or proteomes with ncAAs [51, 52] (see below). 
Recently, Issacs and Church [45, 53] also showed that the incorporation of various 
ncAAs into some EF. coli essential genes can serve as a promising biosafety tactic: 
As long as the ncAAs is absent from the medium, no bacterial growth could be 
detected. Obviously, substitution of canonical amino acids with ncAAs and the 
expansion of the genetic code in essential genes with ncAAs can be promising 
strategies to isolate further synthetic organisms from natural ones [54, 55]. Such 
strains can even have practical importance when applications such as bioremedia- 
tion (in open systems) or industrial biocatalysis (closed systems) are 
considered [56]. 

In this context, it is attractive to reassign some of the (degenerated and rare) 
codons of the genetic code as recently reported by Budisa and Bohlke [57]. They 
succeeded “to deprive” Escherichia coli of the capacity to read one of its own 
triplets, the AUA codon, and thus “emancipated” these bacteria to translate all 
5,797 AUA triplets with synthetic or alien amino acids. This presents the first step 
toward a so-called codon reassignment during which new amino acids, which do 
not occur in nature, are inserted into the genetic code. These cells feature a different 
genetic code relative to all other living organisms and present, therefore, a prelim- 
inary stage to completely synthetic cells. 
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Most recently, the group of Budisa [51] reported a long-term evolution which led 
to 20,899 reassignments in the genetic code of the bacterium Escherichia coli. In 
particular, a long-term cultivation experiment in defined synthetic media resulted in 
the evolution of cells capable of surviving full tryptophan to thienopyrrole-alanine 
substitutions in their proteomes in response to all TGG codons in the genome. 
These evolved bacteria with their new-to-nature amino acid composition are capa- 
ble of robust growth in the complete absence of the canonical (natural) amino acid 
tryptophan. Doubtless such experimental results not only reveal that translational 
ambiguity is essential for the evolution of alternative genetic codes; they also 
pinpoint a strategy for the evolution of synthetic cells with alternative biochemis- 
tries [58]. It should be noted, however, that 20,899 UGG codons in Budisa’s 
evolution experiments could be defined as trophically reassigned (i.e., the meaning 
of a codon is redefined throughout the whole translationary machinery for the 
evolved cells only in the defined synthetic medium). However, supplementation 
of cells in such a media with canonical substrate tryptophan reverses them to 
“natural” ones as they still favor the incorporation of the canonical building 
block. To achieve a nutrient-independent reassignment (ie., “real” codon 
reassignment) for all the genome UGG codons in E. coli, an experimental strategy 
for biocontainment needs to be developed and executed. 


5 Trophic and Semantic Containment, Astrobiochemistry 
and the Origin of Life 


The genetic code is almost universal on Earth and its way of being read is a crucial 
step for the transfer of genetic information. With the creation of a modified 
microbial strain showing reprogrammed codons throughout its genome, the exper- 
imental change of this unity would be achieved. The natural limitation of the 
genetic code exemplified by a limited repertoire of amino acids as building units 
could be transcended via trophic and/or semantic containment [59]. The trophic 
containment is to make microorganisms to be dependent on unnatural nutrients 
(xeno-nutrients). In more detail, trophic containment means the implementation of 
xeno-nutrients and the prevention by cross-feeding of natural alternative nutrients 
or analogues (59), whereas semantic containment is based upon the prevention of 
genetic information exchange, for example, by using different interpretation sys- 
tems such as those using XNA or those with an alternative genetic code [21]. 

An alternative genetic code may decipher for a smaller or larger number of 
amino acids. In addition, a selection of amino acids could be replaced by ncAAs 
(genetic code engineering) or a selection of ncAAs could be added to the genetic 
code’s repertoire (genetic code expansion). To equip the genetic code with novel 
chemical functionalities, some yet occupied codons have to be released from their 
original function, hence uncoupling cells from the canonical reading of individual 
codon triplets (codon emancipation) [19, 57, 60]. 
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Meanwhile, xenobiology is an emergent area at the interface of synthetic biology 
and synthetic organic chemistry which aims to construct biological systems 
endowed with novel biochemistries such as XNAs and/or ‘“‘xeno” amino acids 
(usually called noncanonical amino acids, ncAAs). Xenobiology aims to answer 
the fundamental questions in the chemistry of life: can biological systems also 
function with an alternative genetic code composed of XNA or ncAAs as building 
blocks or both? Xenobiological research is therefore closely linked to studies on the 
origin of life, including the development of the genetic code [5, 17, 61, 62]. 

Consequently, a conversation with astrobiochemistry or astrobiology yields 
important insights. For instance, the intensive chemical analysis of carbon- 
containing meteorites such as the Murchison meteorite demonstrated the presence 
of more than 70 extraterrestrial amino acids of which the L-enantiomers were more 
dominant [46]. By testing the suitability of these amino acids as building blocks for 
the production of proteins in terrestrial life forms (viruses, archaea, eubacteria, 
fungi, plants, and eukaryotes), we would gain new knowledge regarding the 
experimental rules and determinants by which the universal genetic code on 
Earth is limited to 20 AAs. 

Xenobiology reached a highly significant milestone by demonstrating the syn- 
thesis of an organism that in at least some aspect and under some environmental 
conditions has a higher evolutionary fitness than a natural organism [63, 64]. In this 
context, the above-mentioned experiment of the Budisa group [51] represents “the 
most structurally disturbing deviation introduced into Life so far” (P. Marliere, 
personal communication). It shows how far the experimental evolution of the 
bacterium Escherichia coli can be pushed by demonstrating that complete replace- 
ment of one the endogenous building blocks, tryptophan (20,899 TGG codons) by 
an exogenous/synthetic one (thienylpyrrole) is possible. On the other hand, the 
general importance of such an engineering experiment is enormous: it suggests that 
ncAAs may indeed be potentially advantageous in some artificial media (environ- 
ment) which is also important to keep in mind when we search for life elsewhere in 
the universe. Namely, most likely it “would have a biochemistry different from life 
on our planet” [58]. 


6 Ethical and Philosophic Considerations on Xenobiology 


The abilities of xenobiology to construct microorganisms with new-to-nature 
biological systems and functions by cellular tinkering and experimental evolution 
require all stakeholders to use this technology safely and responsibly [36]. The 
issues include both ethical and philosophical considerations. Responsible research 
calls for engagement from all involved stakeholders ranging from researchers from 
both academy and industry to the regulatory authorities to the public at large 
[65, 66]. Regarding ethical considerations, we should bear in mind that xenobiology 
is not a fundamental challenge to bioethics. As a subfield of synthetic biology which 
gains a lot of attention regarding bioethics, both are all still part of biology. 
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The first ethical consideration is safety. This concern comes in multiple facets, 
ranging from biosafety, risk assessment to impact on health and environment 
[67, 68]. The biosafety challenges of xenobiology have been reviewed as part of 
synthetic biology risk assessment [69-71]. Possible impacts on health needing 
careful attention include novel toxicity, allergenicity, and pathogenicity of the 
new-to-nature molecules (and eventually life forms) generated by xenobiology, 
and the potential impacts on the environment include ecological competitiveness 
and the degree of horizontal gene transfer (e.g., lack of metric to measure the escape 
frequency of these types of containment) [72]. One could not rule out the challenges 
raised from xenobiology for biosecurity, directly or indirectly. Examples are the 
potential to develop novel pathogens with no available treatment option because 
they might have a different makeup, making them resistant to available drugs or 
native defense mechanisms of the recipient hosts as a direct challenge, or tech- 
niques developed by xenobiological research that would be abused to produce 
toxins or restricted chemicals as an indirect challenge. 

The second consideration is the viewpoint of novel organisms as machines [73- 
76]. Although the morality of microorganisms is in question, some scholars insist 
that microbes should be granted the moral status that they should have the right to 
exist as they have been [77]. In a reverse scenario, xenobiology, instead of posing a 
threat to microbes, speculates about adding novel species to the world (although 
contained) that are different from existing ones on a biochemical level (see, e.g., 
[56]). For the time being, the speed of bringing up new “species” is foreseeably 
much lower than the rate by which existing biodiversity is eliminated as a conse- 
quence of human behavior in the so-called anthropocene. The argument of adding 
new species should not be understood as an excuse to curb down the efforts of 
stopping the ongoing global biodiversity decline because it is hardly possible to 
compare the (ecological) importance of an existing species with a newly 
created one. 

The third consideration is related to intellectual property (IP). The IP issue 
gravitates around how the knowledge and technologies accumulating in research 
can be shared and translated to the market [65, 76, 78-80]. This type of technology, 
however, might also be developed into a stringent control of productions based on 
microbes equipped with controlled synthetic auxotrophy, which could lead to the 
establishment of terminator-like technological solutions (where a certain chemical 
has to be purchased to guarantee the survival of the cells). Xenobiology could, 
however, also be explored and developed by open source biologists into a realm of 
biotechnology that is free of IP restrictions. 

Another scenario is the development of different IP regimes in distinct 
xenobiological fields (e.g., restrictive in HNA and permissive in CeNA variants 
of nucleic acids, or vice versa). It is unclear to what extent current access and 
benefit sharing agreements, such as the UN Nagoya protocol [81], apply to 
xenobiology at all, opening the way for alternative forms of sharing [82]. 

For philosophic considerations, other than the “playing God” hubris concern 
which has been discussed extensively, the considerations for xenobiological 
research deal with scientific attempts to alter or redirect evolution [83, 84]. It is 
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known that microbes equipped with novel genetic makeup (e.g., using XNA as 
genetic information carrier) have been engineered, which are usually equipped with 
an error-prone mechanism in their genetic replication [85]. However, natural 
evolution has taken millions of years to reach the less error-prone biological 
systems we know today. The rationale to alter or redirect this natural process to 
fit the research purposes may be in question. Meanwhile, one would probably need 
to ask the question as to whether it makes sense to redirect evolution within an 
extremely short time frame to achieve something comparable to natural evolution? 

Xenobiological research should learn from the lessons of other research fields 
(such as agriculture research, stem cell research, and nanotechnology) to take into 
account the ethical and philosophic considerations relating to the research and 
development of the field. In addition, the research should also embed the Respon- 
sible Research and Innovation framework to serve as an optimal model for an 
emerging technology, to avoid developing a technology that does not benefit 
society, and to build trust in research and innovation. More open (and open 
ended) debates on xenobiology are needed to produce the broad stakeholder 
involvement as foreseen in Responsible Research and Innovation. 


Acknowledgments The thoughts and ideas presented here are largely results of our interaction 
with Philippe Marliére, Sven Panke, Piet Herdewijn, Carlos-Acevedo Rocha, and Dirk Schulze- 
Makuch. Another very fortunate circumstance was that we worked together in EU-FP7 founded 
project METACODE (289572) whereby we could start to implement some of our conceptual ideas 
in the field of xenobiology. MS and NB also acknowledge support form EC FP7 project 
SYNPEPTIDE (613981) and MS acknowledges EC FP7 project SYNENERGENE (321488). 


References 


1. Budisa N (2014) Life at the speed of light. From the double helix to the dawn of digital life. By 
J. Craig Venter. Angew Chem Int Ed 53(36):9421-9422 

2. Venter JC (2013) Life at the speed of light — from the from the double helix to the dawn of 
digital life. Viking Penguin, New York 

3. Balter M (2015) Farming was so nice, it was invented at least twice. Science news, from http:// 
www.sciencemag.org/news/2013/07/farming-was-so-nice-it-was-invented-least-twice 

4. Cameron DE, Bashor CJ, Collins JJ (2014) A brief history of synthetic biology. Nat Rev 
Microbiol 12(5):381—390 

5. Schmidt M (2010) Xenobiology: a new form of life as the ultimate biosafety tool. BioEssays 
32(4):322-331 

6. de Lorenzo V (2010) Environmental biosafety in the age of synthetic biology: do we really 
need a radical new approach? Environmental fates of microorganisms bearing synthetic 
genomes could be predicted from previous data on traditionally engineered bacteria for in 
situ bioremediation. BioEssays 32(11):926-931 

7. Wiltschi B, Budisa N (2007) Natural history and experimental evolution of the genetic code. 
Appl Microbiol Biotechnol 74(4):739-753 

8. Heinemann M, Panke S (2006) Synthetic biology — putting engineering into biology. Bioin- 
formatics 22(22):2790-2799 

9. Buckling A, Craig Maclean R, Brockhurst MA, Colegrave N (2009) The Beagle in a bottle. 
Nature 457(7231):824—829 


312 M. Schmidt et al. 


10. 


11. 


12. 


13. 


14. 
15; 


16. 


17. 


18. 


19 


20. 


21. 


22. 


23, 


24. 


29; 


26. 


20s 


28. 


29. 
30. 


31. 


32. 


33. 


Morowitz HJ, Heinz B, Deamer DW (1988) The chemical logic of a minimum protocell. Orig 
Life Evol Biosph 18(3):28 1-287 

Budisa N (2004) Prolegomena to future experimental efforts on genetic code engineering by 
expanding its amino acid repertoire. Angew Chem Int Ed Eng] 43(47):6426-6463 

Budisa N (2005) Expanding the amino acid repertoire for the design of novel proteins. Willey- 
VHC, Weinheim/New York/Brisbane/Singapore/Toronto 

Church G, Regis E (2012) Regenesis: how synthetic biology will reinvent nature and our- 
selves. Basic Books, New York 

Multhauf RP (1966) The origins of chemistry. Oldbourne, London 

Fisher E (1907) Synthetic chemistry in its relation to biology (Faraday Lecture). J Chem Soc 
Chem Commun 91:1749-1765 

Acevedo-Rocha CG, Budisa N (2011) On the road towards chemically modified organisms 
endowed with a genetic firewall. Angew Chem Int Ed Engl 50(31):6960-6962 

Mampel J, Buescher JM, Meurer G, Eck J (2013) Coping with complexity in metabolic 
engineering. Trends Biotechnol 31(1):52-60 

Schmidt M, de Lorenzo V (2012) Synthetic constructs in/for the environment: managing the 
interplay between natural and engineered biology. FEBS Lett 586(15):2199-2206 
Acevedo-Rocha CG (2016) The synthetic nature of biology. In: Hagen K, Engelhard M, 
Toepfer G (eds) Ambivalences of creating life: societal and philosophical dimensions of 
synthetic biology. Springer, Switzerland, pp 9-53 

Agapakis CM, Silver PA (2009) Synthetic biology: exploring and exploiting genetic modu- 
larity through the design of novel biological networks. Mol Biosyst 5(7):704—713 

Budisa N (2014) Xenobiology, new-to-nature synthetic cells and genetic firewall. Curr Org 
Chem 18(8):936-943 

Hutchison CA III, Chuang RY, Noskov VN, Assad-Garcia N, Deerinck TJ, Ellisman MH, 
Gill J, Kannan K, Karas BJ, Ma L, Pelletier JF, Qi ZQ, Richter RA, Strychalski EA, Sun L, 
Suzuki Y, Tsvetanova B, Wise KS, Smith HO, Glass JI, Merryman C, Gibson DG, Venter JC 
(2016) Design and synthesis of a minimal bacterial genome. Science 351(6280):aad6253 
Gibson DG, Glass JI, Lartigue C, Noskov VN, Chuang RY, Algire MA, Benders GA, 
Montague MG, MaL, Moodie MM, Merryman C, Vashee S, Krishnakumar R, Assad-Garcia N, 
Andrews-Pfannkoch C, Denisova EA, Young L, Qi ZQ, Segall-Shapiro TH, Calvey CH, 
Parmar PP, Hutchison CA 3rd, Smith HO, Venter JC (2010) Creation of a bacterial cell 
controlled by a chemically synthesized genome. Science 329(5987):52-56 

SCHER, SCENIHR, SCCS (2014) Opinion on synthetic biology I definition. Available at 
http://ec.europa.eu/health/scientific_committees/emerging/docs/scenihr_o_044.pdf 

Mayer C, Gillingham DG, Ward TR, Hilvert D (2011) An artificial metalloenzyme for olefin 
metathesis. Chem Commun (Camb) 47(44): 12068-12070 

Alterovitz G, Muso T, Ramoni MF (2010) The challenges of informatics in synthetic biology: 
from biomolecular networks to artificial organisms. Brief Bioinform 11:80—95 

Danchin A (2009) Information of the chassis and information of the program in synthetic cells. 
Syst Synth Biol 3(1-4):125—134 

Landrain TE, Carrera J, Kirov B, Rodrigo G, Jaramillo A (2009) Modular model-based design 
for heterologous bioproduction in bacteria. Curr Opin Biotechnol 20(3):272-279 

Keasling JD (2008) Synthetic biology for synthetic chemistry. ACS Chem Biol 3(1):64—76 
Noirel J, Ow SY, Sanguinetti G, Wright PC (2009) Systems biology meets synthetic biology: a 
case study of the metabolic effects of synthetic rewiring. Mol Biosyst 5(10):1214—1223 
Lajoie MJ, Rovner AJ, Goodman DB, Aerni HR, Haimovich AD, Kuznetsov G, Mercer JA, 
Wang HH, Carr PA, Mosberg JA, Rohland N, Schultz PG, Jacobson JM, Rinehart J, Church 
GM, Isaacs FJ (2013) Genomically recoded organisms expand biological functions. Science 
342(6156):357—360 

Herdewijn P, Marliere P (2009) Toward safe genetically modified organisms through the 
chemical diversification of nucleic acids. Chem Biodivers 6(6):79 1-808 

Benner SA, Sismour AM (2005) Synthetic biology. Nat Rev Genet 6(7):533-543 


Xenobiology: State-of-the-Art, Ethics, and Philosophy of New-to-Nature Organisms 313 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


4l. 


42. 


43. 


44, 


45. 


46. 


47. 


48. 
49. 


50. 


51. 


52. 


53. 


Kimoto M, Cox RS 3rd, Hirao I (2011) Unnatural base pair systems for sensing and diagnostic 
applications. Expert Rev Mol Diagn 11(3):321-331 

Matsunaga K, Kimoto M, Hanson C, Sanford M, Young HA, Hirao I (2015) Architecture of 
high-affinity unnatural-base DNA aptamers toward pharmaceutical applications. Sci Rep 
5:18478 

SCHER, SCENIHR, SCCS (2015) Opinion on synthetic biology II - risk assessment method- 
ologies and safety aspects. Available at http://ec.europa.eu/health/scientific_committees/con 
sultations/public_consultations/scenihr_consultation_26_en.htm 

Wright O, Delmans M, Stan GB, Ellis T (2015) GeneGuard: a modular plasmid system 
designed for biosafety. ACS Synth Biol 4(3):307-316 

Wright O, Stan GB, Ellis T (2013) Building-in biosafety for synthetic biology. Microbiology 
159(Pt 7):1221-1235 

Schmidt M (2013) Safeguarding the genetic firewall with xenobiology. 21st century borders/ 
synthetic biology: focus on responsibility and governance, Institute on Science for Global 
Policy, Tucson, Arizona 

Brookes P (1959) Studies on the incorporation of an unnatural amino acid, p-di-(2-hydroxy 
[{14C2]ethyl)amino-L-phenylalanine, into proteins. Br J Cancer 13:313-317 

Beiboer SH, van den Berg B, Dekker N, Cox RC, Verheij HM (1996) Incorporation of an 
unnatural amino acid in the active site of porcine pancreatic phospholipase A2. Substitution of 
histidine by 1,2,4-triazole-3-alanine yields an enzyme with high activity at acidic pH. Protein 
Eng 9(4):345-352 

Budisa N, Minks C, Alefelder S, Wenger W, Dong F, Moroder L, Huber R (1999) Toward the 
experimental codon reassignment in vivo: protein building with an expanded amino acid 
repertoire. FASEB J 13(1):41-51 

Lemeignan B, Sonigo P, Marliére P (1993) Phenotypic suppression by incorporation of an 
alien amino acid. J Mol Biol 231(2):161-166 

Jang MY, Song XP, Froeyen M, Marliere P, Lescrinier E, Rozenski J, Herdewijn P (2013) A 
synthetic substrate of DNA polymerase deviating from the bases, sugar, and leaving group of 
canonical deoxynucleoside triphosphates. Chem Biol 20(3):416-423 

Mandell DJ, Lajoie MJ, Mee MT, Takeuchi R, Kuznetsov G, Norville JE, Gregg CJ, Stoddard 
BL, Church GM (2015) Biocontainment of genetically modified organisms by synthetic 
protein design. Nature 518(7537):55—-60 

Pinheiro VB, Holliger P (2012) The XNA world: progress towards replication and evolution of 
synthetic genetic polymers. Curr Opin Chem Biol 16(3-4):245-252 

Pinheiro VB, Taylor AI, Cozens C, Abramov M, Renders M, Zhang S, Chaput JC, Wengel J, 
Peak-Chew SY, McLaughlin SH, Herdewijn P, Holliger P (2012) Synthetic genetic polymers 
capable of heredity and evolution. Science 336(6079):34 1-344 

Steele FR, Gold L (2012) The sweet allure of XNA. Nat Biotechnol 30(7):624—625 

Taylor AI, Pinheiro VB, Smola MJ, Morgunov AS, Peak-Chew S, Cozens C, Weeks KM, 
Herdewijn P, Holliger P (2015) Catalysts from synthetic genetic polymers. Nature 
518:427-430 

Marliere P, Patrouix J, Doring V, Herdewijn P, Tricot S, Cruveiller S, Bouzon M, Mutzel R 
(2011) Chemical evolution of a bacterium’s genome. Angew Chem Int Ed Engl 50 
(31):7109-7114 

Hoes! MG, Oehm S, Durkin P, Darmon E, Peil L, Aerni HR, Rappsilber J, Rinehart J, Leach D, 
Soll D, Budisa N (2015) Chemical evolution of a bacterial proteome. Angew Chem Int Ed Engl 
54(34):10030-10034 

Ma Y, Biava H, Contestabile R, Budisa N, di Salvo ML (2014) Coupling bioorthogonal 
chemistries with artificial metabolism: intracellular biosynthesis of azidohomoalanine and its 
incorporation into recombinant proteins. Molecules 19(1):1004—1022 

Rovner AJ, Haimovich AD, Katz SR, Li Z, Grome MW, Gassaway BM, Amiram M, Patel JR, 
Gallagher RR, Rinehart J, Isaacs FJ (2015) Recoded organisms engineered to depend on 
synthetic amino acids. Nature 518:89-93 


314 M. Schmidt et al. 


54. Dolgin E (2015) Safety boost for GM organisms. Nature 517:423 

55. Nunes-Alves C (2015) GMOs in lockdown. Nat Rev Microbiol 13:3443 

56. Schmidt M, de Lorenzo V (2016) Synthetic bugs on the loose: containment options for deeply 
engineered (micro)organisms. Curr Opin Biotechnol 38:90—96 

57. Bohlke N, Budisa N (2014) Sense codon emancipation for proteome-wide incorporation of 
noncanonical amino acids: rare isoleucine codon AUA as a target for genetic code expansion. 
FEMS Microbiol Lett 351(2):133-144 

58. Acevedo-Rocha CG, Schulze-Makuch D (2015) How many biochemistries are available to 
build a cell? Chembiochem 16(15):2137-2139 

59. Marliere P (2009) The farther, the safer: a manifesto for securely navigating synthetic species 
away from the old living world. Syst Synth Biol 3(1-4):77-84 

60. Hoesl MG, Budisa N (2012) Recent advances in genetic code engineering in Escherichia coli. 
Curr Opin Biotechnol 23(5):75 1-757 

61. Acevedo-Rocha CG, Fang G, Schmidt M, Ussery DW, Danchin A (2013) From essential to 
persistent genes: a functional approach to constructing synthetic life. Trends Genet 
29:273-279 

62. Popa R (2010) Necessity, futility and the possibility of defining life are all embedded in its 
origin as a punctuated-gradualism. Orig Life Evol Biosph 40(2):183-190 

63. Pezo V, Metzgar D, Hendrickson TL, Waas WF, Hazebrouck S, Doring V, Marliere P, 
Schimmel P, De Crecy-Lagard V (2004) Artificially ambiguous genetic code confers growth 
yield advantage. Proc Natl Acad Sci U S A 101(23):8593-8597 

64. Xiao H, Nasertorabi F, Choi SH, Han GW, Reed SA, Stevens RC, Schultz PG (2015) 
Exploring the potential impact of an expanded genetic code on protein function. Proc Natl 
Acad Sci U S A 112(22):6961-6966 

65. Konig H, Dorado-Morales P, Porcar M (2015) Responsibility and intellectual property in 
synthetic biology: a proposal for using Responsible Research and Innovation as a basic 
framework for intellectual property decisions in synthetic biology. EMBO Rep 16:1055—1059 

66. Owen R, Stilgoe J, Macnaghten P, Gorman M, Fisher E, Guston D (2013) A framework for 
responsible innovation. In: Owen R, Bessant J, Heintz M (eds) Responsible innovation, vol 
1. John Wiley & Sons, London, pp 27-50 

67. EGE (2009) Ethically speaking. Available at http://ec.europa.eu/archives/bepa/european- 
group-ethics/docs/ethic_speak_n13_en.pdf 

68. EGE (2009) Ethics of synthetic biology. Available at http://ec.europa.eu/bepa/european- 
group-ethics/docs/opinion25_en.pdf 

69. De Vriend H (2006) Constructing life. Early social reflections on the emerging field of 
synthetic biology. Available at http://www.rathenau.nl/uploads/tx_tferathenau/WED97_ 
Constructing_Life_2006.pdf 

70. European Commission (2010) Synthetic biology from science to governance. Workshop 
organised by the European Commission’s Directorate-General for Health & Consumers, 
Brussels 

71. Schmidt M (2008) Diffusion of synthetic biology: a challenge to biosafety. Syst Synth Biol 2 
(1-2):1-6 

72. SCHER, SCENIHR, SCCS (2015) Opinion on synthetic biology IL: research priorities. 
Available at http://ec.europa.eu/health/scientific_committees/consultations/public_consulta 
tions/scenihr_consultation_28_en.htm 

73. Baertschi B (2013) Defeating the argument from hubris. Bioethics 27(8):435-441 

74. Boldt J, Muller O, Maio G (2009) Synthetische Biologie Eine ethisch-philosophische Analyse. 
Bundesamt fur Bauten und Logistik, Bern 

75. Deplazes A (2009) Piecing together a puzzle. An exposition of synthetic biology. EMBO Rep 
10(5):428-432 

76. Douglas T, Savulescu J (2010) Synthetic biology and the ethics of knowledge. J Med Ethics 36 
(11):687-693 

77. Cockell CS (2011) Microbial rights? EMBO J 12:181 


Xenobiology: State-of-the-Art, Ethics, and Philosophy of New-to-Nature Organisms 315 


78. Frank D, Heil R, Coenen C, Konig H (2015) Synthetic biology’s self-fulfilling prophecy - 
dangers of confinement from within and outside. Biotechnol J 10(2):231—235 

79. Heavey P (2013) Synthetic biology ethics: a deontological assessment. Bioethics 27 
(8):442-452 

80. Minssen T, Rutz B, van Zimmeren E (2015) Synthetic biology and intellectual property rights: 
six recommendations. Biotechnol J 10(2):236-241 

81. CBD (2012) Nagoya Protocol on access to genetic resources and the fair and equitable sharing 
of benefits arising from their utilization to the convention on biological diversity. Available at 
https://www.cbd.int/abs/doc/protocol/nagoya-protocol-en.pdf 

82 Trojok RD (2014) Bio-commons whitepaper. Available at http://bioartsociety.fi/Bio-Com 
mons_Whitepaper.pdf 

83. Dabrock P (2009) Playing God? Synthetic biology as a theological and ethical challenge. Syst 
Synth Biol 3(1-4):47-54 

84. van den Belt H (2009) Playing God in Frankenstein’s footsteps: synthetic biology and the 
meaning of life. Nanoethics 3(3):257—268 

85. Malyshev DA, Dhami K, Lavergne T, Chen T, Dai N, Foster JM, Correa IR Jr, Romesberg FE 
(2014) A semi-synthetic organism with an expanded genetic alphabet. Nature 509 
(7500):385-388 


Index 


A 
ABC transporter, 221 
Acetaldehyde, 107, 276, 277 
dehydrogenase (ALD6), 106, 181, 184, 
189, 192 
Acetate, 100, 107, 184, 189, 194, 222, 241, 
276-278, 287 
Acetolactate, 196 
synthase (ALS), 101, 196 
Acetyl-CoA, 54, 67, 99, 106, 151, 161, 184, 
190-192, 197, 225, 273, 276, 278 
acetyltransferase, 155 
carboxylase, 67, 278 
synthetase, 106, 107 
N-Acetylglucosamine, 103 
Acinetobacter calcoaceticus, 191, 192 
Acrolein, 202 
Alcohol dehydrogenase (ADH), 101, 242 
Alginate, 93, 175, 182 
Allose, 200 
Allyloxycarbonyl-N-e-methyl-L-lysine, 6 
a-Amylase, 244 
Amino acids, 218 
aromatic analogs, 13 
biosynthesis, 50 
hydroxy, 5 
noncanonical (ncAA), 1, 14, 307, 309 
unnatural (UAAs), 98 
Aminoacyl-tRNA synthetases, 2 
Amino-6-((2-azidoethoxy)carbonylamino) 
hexanoic acid, 8 
Amino-2-hydroxy-L-hexanoic acid 
(Boc-LysOH), 5 


Amorphadiene, 68, 94, 103, 120, 131, 133, 198 
Anthocyanins, 102 
Antioxidants, 175, 200 
Arabinose, 59, 168, 181, 182, 187, 201, 220, 
221, 244 
Arabitol, 222 
Arginine, 50, 223, 237-239 
Artemisia annua, 103 
Artemisinic aldehyde dehydrogenase 
(ALDH1), 103 
Artemisinin, 22, 94, 103, 104, 120, 198, 240 
Aspartokinase, 53, 54, 57, 58, 69 
Aspergillus 
A. nidulans, 101, 191, 192, 277 
A. niger, 101 
A. terreus, 95, 102, 199 
Azidocyclo-pentyloxy carbonyl-L-lysine 
(ACPK), 5, 9 
Azinomycin, 123 


B 
Bacillus 
B. cereus, 189 
B. subtilis, 235 
Bacterial microcompartments (MCPs), 102 
Berberine, 121 
B-Glucosidase, 97, 182, 201, 244 
Bifidobacterium longum, 199 
BioBrick, 85, 134 
Biocatalysis, 118 
Biofuels, 77, 122, 138, 175, 242, 265, 285 
yeast, 185 


317 


318 


Biohydrogen, 122 
Bioswitches, 45 
DNA-level, 48 
engineering, 55 
natural, 47 
protein-level, 53 
RNA-level, 51 
Bis(dansyl)cystamine, 12 
N,O-Bis(trimethylsilyl)trifluoroacetamide 
(BSTFA), 269 
Bottleneck, 265 
Brevibacterium ammoniagenes, 191 
Butanediol (BDO), 94, 106, 156, 158, 175, 193, 
204 
Butanol, 123, 187, 189 
2-[4-{ (bis[(1-tert-butyl-1H-1,2,3-triazol-4-yl) 
methyl] amino)-methy]}-1H-1,2,3- 
triazol-1-ylJethyl hydrogen sulfate 
(BTTES), 9 
N-tert-Butyldimethylsilyl-N- 
methyltrifluoroacetamide 
(MTBSTFA), 269 


Cc 
Candida 
C. albicans, 199 
C. intermedia, 182 
Carbon-13, metabolic flux 
analysis (C'3-MFA), 205 
Carotenoids, 86, 240 
Cascade reactions, 117-138 
Catechins, 86, 102 
Catechol 1,2-dioxygenase, 96, 97, 199 
Cell—cell communication, 69 
Cell factories, 77 
Cell-free biosynthesis, 117 
Cell-free extracts (CFXs), 123 
Cellobiose, 97, 175, 182, 187, 199, 201, 244 
Cellodextrins, 178, 182 
Cellodextrin transporter 2 (CDT2), 97, 201 
Cellular regulation, 45 
Cellulases, 182, 244 
Cellulose, 130, 156, 182, 184, 244 
Cheilanthifoline synthase (CFS), 106 
Chemically modified organisms (CMOs), 306 
Chloramphenicol acetyltransferase, 5 
Chorismate mutase, 54 
Chromatin immunoprecipitation (ChIP), 50 
Chromosomal integration (CI), 91 
Circular polymerase extension cloning 
(CPEC), 135 
cis-Aconitic acid decarboxylase 
(CAD), 95, 101 
Citrobacter freundii, 102 


Index 


Clostridium 
C. beijerinckii, 190 
C. butyricum, 161 
C. kluyveri, 157, 158 
Clustered regularly-interspaced short 
palindromic repeats (CRISPR), 95 
Clustered regularly interspaced short 
palindromic repeats interference 
(CRISPRi), 166 
Codon optimization, 96 
Coenzyme Bj, 51 
Cofactors, 99 
imbalance, 265, 280 
Combinatorial optimization, 117 
Compartmentalization, 100 
Copper-induced azide-alkyne cycloaddition 
(CuAAC), 9 
Corynebacterium glutamicum, 217, 283 
Crosslinkers, 7 
Crotonyl-L-lysine, 9 
2-Cyanobenzothiazole (CBT), 10 
Cysteinyl-L-lysine, 10 
Cytochrome c oxidase, 101 


D 

Dehydroxyacid dehydratase (DADH), 101 

3-Deoxy-D-arabino-heptulosonate 
7-phosphate synthase (DAHPS), 60 

Deoxy-xylulose-P-synthase (dxs), 24 

Desulfitobacterium hafniense, 5 

Diaminobutane (putrescine), 241 

Diaminopentane (cadaverine), 217, 241 

Diaphorase, 122 

Dideoxynosine (didanosine), 120 

Dihydrofolate reductase (DHFR), 61 

Dihydroxyacetone phosphate (DHAP), 121, 
132, 276 

Dimethoxy-2-phenylaceto-phenone (DPAP), 
12 

Dimethylallyl pyrophosphate (DMAPP), 120 

DNA assembly, 77, 84, 89, 117, 134 

in silico design, 90 

DNA bases, noncanonical, 306 

DNA regulators, ligand-sensitive, 50 

Dynamic metabolic control, 45 


E 

Ectoine, 239 

Embden—Meyerhof—Parnas (EMP) pathway, 
222 

Endoglucanase, 244 

Enterocin, 121 

Enzymes, allosteric, 54 


Index 


ePathBrick, 86 

Ergosterol, 67, 68 

Error-prone PCR (Ep-PCR), 24 

Erythritol, 201 

Escherichia coli, 2-15, 23, 46, 50, 68, 147, 
183, 221, 266, 307 

Ethanol, 25, 97, 102, 122, 176-186, 242, 280 

Ethics, 301 

Experimental design-aided systematic pathway 
optimization (EDASPO), 94 


F 

Fagomine, 121 

Farnesene, 204 

Farnesyl pyrophosphate (FPP), 68, 98, 131, 
197, 198, 240 

Fatty acid ethyl esters (FAEE), 106 

Fatty acids, 26, 67, 86, 191, 278-283, 304 

biosynthesis, 67, 283 

Fermentation inhibitors, 285 

Flavin mononucleotide (FMN), 51, 52, 65 

Flippase recombinase (FLP), 92 

p-Fluoro-phenylalanine, 4 

4-Hydroxybutyrate (4HB), 95, 147, 157, 158, 
166 

4-Hydroxyphenylacetaldehyde (4-HPAA), 104 

Fructokinase, 220 

Fructose, 55, 138, 219-223, 229 

Fructose 1,6-bisphosphate, 219-222 

Fructose 6-phosphate, 225 

Furan-2-yl ethoxy carbonyllysine, 12 

Furfural, 244 


G 

Galactose, 26, 103, 125, 178, 183, 187, 195, 
198 

Gelidium amansii, 184 

Gene expression, 21, 90 

Genetic code, expanded, 1 

Geranylgeranyl diphosphate (GGPP), 98, 198 

synthase (GGPPS), 98, 197, 198 

Gibson assembly/isothermal assembly (ITA), 
86, 135 

Glucokinase, 68, 125, 220 

Gluconate, 68, 151, 220, 221, 233 

Gluconobacter oxydans, 98 

Glucose, 180, 190-196, 199, 202, 219-229, 
239, 243, 277, 283 

Glucose kinase, 219 

Glucose 6-phosphate (G6P) dehydrogenase, 
223, 282, 283 

Glutamate, 50, 218, 222, 224, 226, 239 


319 


Glutamate dehydrogenase, 188 

Glutamine synthase (GS), 24, 188, 229, 231 

Glutathione, 12, 202 

Glyceraldehyde 3-phosphate dehydrogenase 
(GAPDH), 222 

Glycerol, 24, 121, 188, 190, 196, 276 

Glycerol 3-phosphate dehydrogenase (GPD1), 
24, 276 

Glycerol phosphate L-glycerol-3-phosphate 
oxidase (GPO), 121 

Glycolic acid, 196, 200 

Goldengate assembly/cloning, 88, 135 

Guluronate, 182 


H 

Halophenylalanines, 13 

Hexokinase, 68 

High-copy number plasmids (HCP), 91 

Histidines, 13, 49, 54, 219 

HIV-1, 12 

Homoserine dehydrogenase (HSDH), 60 

Human cytomegalovirus (hCMV), 24 

Human small ubiquitin-related modifier 
(SUMO), 10 

Human superoxide dismutase (hSOD), 7, 11 

Hybrid promoter engineering, 26 

Hybrid terminator engineering, 34 

Hydrocodone, 106 

Hydroxymethylfurfural (HMF), 244 


I 
iBrick, 86 
Inclusion bodies, 147 
Industrial biotechnology, 217 
Industrial raw material, 217 
Isobutanol, 68, 101, 188, 191, 192, 226, 243 
Isopenicillin N, 101 
Isopentanol, 106 
Isopentenyl-pyrophosphate (IPP), 120, 240 
Isoprene, 120 
Isoprenoids, 196 
Isopropanol, 70 
Isothermal assembly/Gibson assembly (ITA), 
86, 135 
Isotopes, 205 
analysis, 269 
Itaconic acid, 35, 95, 101, 195, 199, 224 


J 
j5 DNA, 90 


320 


K 

Keto-acid decarboxylase (KDC), 101 
Ketogulonicigenium vulgare, 98 
Ketoisovalerate (KIV), 101, 191, 243 
Ketol-acid reductoisomerase (KARI), 101 
Klebsiella oxytoca, 137 

Kluyveromyces marxianus, 185 


L 

Lactate dehydrogenase, 122 

Lactic acid, 122, 176, 195, 199, 204 

Lactobacillus plantarum, 181, 199 

Lactococcus lactis, 24, 190 

Lactose, 175, 184 

Ligase chain reaction (LCR), 87 

Lignocellulose, 244, 280, 285, 287 
hydrolysates, 178, 203, 243, 285 

Lipase, 275, 284 

Low-copy number plasmids (LCP), 91 

Lycopene, 24, 240 

Lysine, 217, 220, 238 

LysR family, 50 


M 

Malaria, 103, 120, 176, 198, 240 
Malonyl-CoA, 67, 278 

Maltose binding protein (MBP), 61 
Mannitol, 175, 182, 201, 220, 222 
Mannose, 178, 219, 220 
Mannuronate, 182 


Methanococcus jannaschii, 1, 4, 14 
Methanogenesis, 2 
Methanosarcina 

M. barkeri, 1-5 

M. mazei, 1-14 
2-Methyl-1-butanol, 101 
Methyl]-diazirin-3-yl ethoxy 
carbonyl-L-lysine, 11 
Methylerythritol-4-phosphate, 120 
Methyltransferases, 2, 104 


(MSTFA), 269 
Methyltyrosine, 13 

Meyerozyma guilliermondii, 182 
MoClo, 88 

Monoamine oxidase, 121 
Monomethylamine methyltransferase 
(MtmB), 2 
Monosaccharides, 118, 120, 183 
Monoterpenes, 196 


Marinobacter hydrocarbonoclasticus, 106, 191 


N-Methyl-N-(trimethylsily])trifluoroacetamide 


Index 


Morphine reductase (MorB), 106 

Muconic acid, 97, 199, 200 

Multi-enzyme reaction networks, 117 

Multi-state Bennett acceptance ratio 
(MBAR), 56 

Mutagenesis, techniques, 23 

Mycoplasma genitalium, 87, 137 


N 
NADH/NADPH, 99 

NAD kinase, 99 

Neurospora crassa, 97, 182 

New-to-nature, 301 

Nitrile imines, 12 

Nitrogen fixation pathway, 137 

Nootka cypress (Cupressus nootkatensis), 240 
Norbornene amino acids, 10 

Norcoclaurine, 104 

Norephidrine, 122 

Norpseudoephedrine, 122 

Nucleosomes, 32 


O 

Okazaki fragments, 137 
Oligosaccharides, 118, 123 

Ompx, 9 

Opioids, 103 

Organic acids, 199 

Ornithine, 10, 123, 223, 238, 239 
Orthogonal translation, 1 
Oxaloacetate, 191, 225, 242, 276, 283 
Oxaloacetate decarboxylase (Odx), 225 
2-Oxoglutarate, 57, 224, 225 


P 

Papaver somniferum, 104 

Pathway construction, 77 

Pathway design, 81 

in silico, 83 

Pathway engineering, 79, 217 

Pathway optimization, 77, 90 

PchB (isochorismate pyruvate lyase), 91 

Penicillin, 101 

Pentose phosphate pathway (PPP), 122, 180, 
193, 223, 278-283 

Phenol, 244 

Phenylalanine, 13, 14, 54, 224, 279 

Philosophy, 301 

Phosphoenolpyruvate (PEP), 54, 120, 
219, 225 


Index 


Phosphoenolpyruvate carboxylase (ppc), 24, 
54, 242 
Phosphoglucoisomerase, 223 
Phosphoglucomutase, 183 
6-Phosphogluconolactonase, 223 
D-3-Phosphoglycerate dehydrogenase 
(3-PGDH), 53 
Phosphoglycerate kinase, 223 
Photo-crosslinkers, 11 
Photocages, 7, 11 
Photosensitizers, 12 
Phusion polymerase, 136 
Phytoene, 240 
Pinocembrin, 84, 102 
Plasmids, 91, 232 
Poly(3-hydroxybutyrate) (P3HB), 92, 148 
Poly(3-hydroxybutyrate-co-4- 
hydroxybutyrate) [P(3HB-co-4HB), 
95 
Poly(3-hydroxypropionate) (P3HP), 147 
Polyhydroxyalkanoates (PHA), 95, 147, 150 
Polyhydroxybutyrate (PHB), 147 
Polyketide synthase, 121 
Polylactic acid (PLA), 155, 199 
Polylactide (PLA) 148 
Posttranslational modification (PTM), 6, 8 
Prephenate dehydrogenase, 53 
Proline racemase, 56 
Promoters, 21, 233 
engineering strategies, 27 
strength, 94 
Propanediol, 122 
p-Propargyloxy phenylalanine, 13 
Proteins, acetylation, 7 
activity, 97 
engineering, 97 
Pseudomonas 
P. entomophila, 147, 155, 164, 165 
P. fluorescens, 91 
P. oleovorans, 153 
P. putida, 147, 150, 155, 157, 161-165 
P. stutzeri, 240 
Psicose, 121, 200 
Purines, 125, 279 
synthesis, 53 
Pyrrolidines, 121 
Pyrroline-carboxylysine, 6, 10 
Pyrroloquinoline quinine (PQQ), 99 
Pyrrolysine, 4 
Pyrrolysine-tRNA synthetase (PyIRS), 1, 4 
Pyruvate, 55, 99, 107, 122, 180, 191, 196, 222, 
224, 276, 287 


321 


Pyruvate decarboxylase, 99, 102, 183, 190, 
194, 225, 226, 242, 283 

Pyruvate formate lyase (PFL), 99 

Pyruvate kinase (PYK), 222, 223, 226, 239 


R 

R20DNA Designer, 87 

Ralstonia eutropha, 92, 150, 151, 157-163 

Rational design, 9, 21, 33, 97, 127, 217, 280 

Rational optimization, 117 

Raven, 90 

RBS. See Ribosome binding site (RBS) 

Recombinase-assisted genome engineering 
(RAGE), 93 

Renewable chemicals, 175 

Restriction digestion/ligation, 85 

Resveratrol, 91, 102, 201 

Reticuline, 104 

Rhizopus oryzae, 199, 284 

Ribose, 200, 220, 221 

Ribose 5-phosphate, 221, 225, 276, 279 

Ribosome binding site (RBS), 29-31, 80, 81, 
88, 96, 133 

Riboswitches, 47, 63-70, 235 

Ricinus communis, 278 

RNAs, mRNA, 21, 36, 51, 95, 131, 133, 222, 
230, 236 

small, 230, 233 
regulatory, 79 
tRNA, 2, 96 


S 
Saccharomyces cerevisiae, 7, 15, 22, 46, 68, 
88, 93, 98, 102-106, 136, 161, 175, 
285-287 
S-Adenosyl methionine (SAM), 51, 284 
Salmonella 
S. enterica, 106, 190, 192 
S. typhimurium, 161-163 
Salutaridine synthase (SalSyn), 105 
Salutaridinol acetyltransferase (SalAT), 106 
Saturation mutagenesis, 25 
Scaling, 117 
Scheffersomyces stipitis, 181 
Schizosaccharomyces pombe, 202 
Sequence and ligase independent cloning 
(SLIC), 135 
Sesquiterpenes, 196 
Shikimic acid, 99, 279 
Sigma factors, 228 


322 


Sorbitol, 201 
Sorbosone dehydrogenases (SNDH), 98 
Stop codon suppression, | 
Strain development, 45 
Streptomyces 
S. collinus, 190 
S. sahachiroi, 123 
Succinate, 50, 167, 220, 226, 241, 276 
Succinate dehydrogenase, 166 
Succinic acid, 196-200 
Sugar alcohols, 220, 222 
Sulfolobus acidocaldarius, 197 
Sulforhodamine B, 65 
Switches, biomolecular, 45 
Synechococcus elongatus, 57 
System assembly, 117 
Systematic evolution of ligands by exponential 
enrichment (SELEX), 62 


T 

Tagatose, 200 

TALEN synthesis, 88 

Tannins, 102 

Taxadiene, 240 

Taxus baccata, 98 

TEM1£-lactamase, 61 

Terminators, 21, 33 

Terpenoids, 240, 285 

Tetracycline, 65 

Tetrahydrobiopterin, 104 

Thebaine, 105 

Theophylline, aptamers, 64, 65 

Thiaprolyl-L-lysine, 10 

3-Hydroxybutyrate (3HB), 147 

3-Hydroxydecanoate, 150 

3-Hydroxydodecanoate, 150 

3-Hydroxyhexanoate, 148 

3-Hydroxyoctanoate, 148 

3-Hydroxypropionic acid (3-HP), 67 

Trans-activator of transcription (TAT), 12 

Transcription, regulation, 48, 229 
terminators, 94 

Transhydrogenase, 99, 191, 243, 279-284 


Index 


Treponema denticola, 190 

Tricarboxylic acid (TCA) cycle, 69, 224, 229, 
235, 241, 278, 287 

Trichoderma reesei, 181 

Trypanosoma cruzi proline racemase (TcPR), 
56 

Tryptophan, 26, 59, 62, 93, 224, 308 

2-Keto-L-gulonic acid (2KLG), 98 

TyrRS/tRNA, 1 


U 
Ubiquitination, 7 
Uronates, 182 


Vv 

VA-044, 12 
Valencene, 240 
Violacein, 95 


WwW 
Wailupemycin, 121 


x 
Xeno nucleic acids (XNAs), 307 
Xenobiology, 301 
Xylitol, 122, 201 
Xylose, 137, 180 
fermentation, 25 
Xylulokinase, 221 
Xylulose 282 


Y- 
Yarrowia lipolitica, 82, 202 
Yeast, 24, 32-37, 89, 175 


Z 
Zymomonas mobilis, 102, 242 


